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FOREWORD 



The study presented here is the complete report on the pilot 
phase of Research Project 2l80, amended, under a contract between 
the United States Office of Education and the State College of 
Iowa. The data for the report on the second phase have been 
gathered and the report itself will be forthcoming in lp 67 . 

Though more complete acknowledgements of assistance may be 
made in the final report, it is not too early for the investiga- 
tors to express their appreciation to the many individuals and 
organizations which have assisted in bringing it to fruition. 
First among these must be Dr. J. W. Maucker, President of the 
State College of Iowa 5 Dean William C, Lang^ Dr. H. W. Reninger, 
Head of the Department of English Language and Literature 5 Dr. 
Marshall Beard, Registrar; and Mr. Paul Mahon, until recently in 
charge of Data Processing. Had the administration of the college 
not had the courage to allow students to omit a course frequently 
considered to be vital to their success in col3.ege and in life, 
this project could not even have begun. 

We have also enjoyed the cooperation of the College Entrance 
Examination Board, which made available the CEEB English Composi - 
tion Test, one of our three test instruments . 

Last, we owe a great debt of gratitude to the students who 
participated in the investigation. They were at all times coop- 
erative and helpful, whether they were in the experimental or the 
control group. We hope that the impact of the findings here 
reported will justify their cooperation. 



PURPOSES AND PROCEDURES 



Statement of the Problem 

Research in college composition has not been plentiful, and 
most of the studies reported have concentrated on comparing some 
innovation vith a standard procedure. Variables have ranged from 
the number of papers written through the amount of teacher 
comment on each paper to the influence of such subjects as rhet- 
oric and grammar on the performance of the student. In every 
case the other element in the conqoarison was the particular 
arrangement of freshman con^osition at the institution in which 
the research was done. Seldom has a statistically significant 
difference appeared, and the difficulty is that, even where it 
has, the difference has been between a particular innovation and 
what might be termed standard procedure. A tacit assumption in 
all such research has been that the ’’standard** course ii»5)roved 
student writing and the question was whether the innovation would 
produce a result different from that produced by the standard 
course. These investigations seldom included corqparisons of the 
results with an arrangement involving no formal instruction in 
English composition. 

A second difficulty with the research reported has been that 
the statistical conparisons involved a relatively small number of 
students. The question is always present as to whether the 
sample employed is sufficiently large and broadly based to 
reasonably representative of a given group— for example, all 
entering college freshmen in a substantial number of American 
colleges. In those few instances in which a statistically sig- 
nificant difference has been found, the degree to which general- 
izations beyond the samples investigated may be made is uncer- 
tain. 



The present investigators decided to attenqpt to overcome 
both of these deficiencies. They planned to compare students who 
had received no instruction of the sort generally given in fresh- 
man con5)oaition with comparable students who had recsived such 
instruction. In order to develop statistics for a reasonably 
broad and a reasonably diverse population, they planned to engage 
several institutions in replicating the eaqperiment. This proce- 
dure would give a numerical, geographical, and academic variety 
to the population. If the results at aU participating institu- 
tions were in agreement, the conclusions could be stated with 
considerable force* If the results among the institutions 
varied, directions for future investigation might be indicated. 

The goals of the investigation, then, wer<^ to teat two 
hypotheses t 



(1) That the writing performance of the students enrolled in 
a freshman composition sequence is not significantly 
different from the vjriting performance of students not 
enrolled in a freshman composition sequence when the two 
groups have been in college for an equal length of time. 

(2) That the results obtained in (1) will be present in 
other colleges or universities. 

A by-product of the testing of the hypotheses would be the 
accumulation of statistics based upon a reasonably large and 
diverse sample of students who had received no instruction in 
college freshman composition. Such a set of statistics might 
prove useful in providing a realistic and stable base for inves- 
tigating the effect of innovation as well as of the "standard’* 
course itself. Meaningful use of these statistics could be made 
only if the investigators testing an innovation utilized the 
evaluative instruments employed in the present investigation. 



Pilot Phase 

The present report covers the pilot phase of a two-phase 
project. It is based upon experiences at the State College of 
Iowa from September, 1963, through May, 1965. The second phase 
of this project will involve the performance of students at five 
institutions; the University of Colorado, the University of 
Iowa, Kent State University, Northern Illinois University, and 
the State College of Iowa, from September, 1961*, to May, 1966. 



Related Research 

No research has come to the investigators ' attention which 
is directly comparable to the present study. Nearly all the 
research compares some innovation with a standard procedure. 

Such studies ordinarily vary the frequency of writing in the 
composition course as the experimental variable . Most of these 
obtained no statistically significant differences in the perform- 
ance of the groups of students at the end of instruction. A 
summary of projects with some relevance to the current study is 
given below. 



Arnold, Lois . Effects of Frequency of Writing and Intensity of 
Teacher Evaluation upon rerformance in written Compo sition 
of Tenth Grade S^udenis ( CooperativenSesearch Pro ject llumber 
1525 ) , Tallahassee ; Florida State University, 1963, Univer- 
sity Microfilms No. 63-63UU. 



Miss Arnold conducted her research in 1961-1962 
Florida high schools, in each of which a teacher was scheduled to 
teach four groups of students in the tenth grade. The four 
groups at each school were average classes, dete^ned s^- 
tioning on the basis of scores on the followir^ tests: 

General Ability Test, Metropolit^ Achievement Batte^, Sct^ 
iHSn^lege AbimFTest, and lOTerento^ Agti^ Tgsts 
SHdenirwere cl'^ssiH^ as low average, middle a^rage, or high 
average on the basis of the ^T scores. Nothi^ is said of 
student-to-student matching.^e experiment lasted for the. 
school year. Each teacher at each school used four 
Lthods, a different one for each of her four classes as follows. 

1 Infreouent writing, moderate evaluation; one theme, 

aSStely 250 words, each six weeks. Evaluation was 
concentrated on one natter each tine: once on 

structure, once on organization, etc. 

2. Frequent writing, moderate evaluation: 

times a week, varying from two sentences to two pages or 
more. The evaluation was handled as in 1 above. 

3. Infrequent writing, intensive evaluation: one th^e each 

six weeks, approximately 250 words. Every error in 
usage, sentence structure, and mechanics was marked and 
detailed comments written on the paper. Students cor- 
rected all errors, revised or rewrote until the paper was 

satisfactory. 

Frequent writing, intensive evaluation; one 250-word 
theme weekly, evaluated meticulously as in 3 above 
(pp. UO-2) . 

Two evaluative instruments were used, 2®? 

STEP Writing Tests, the former a writing test, the at j - 4 . 
||lc^^sl!Sith were administered at the 
tli end. Three experienced (former) English teadiws i^epeirf 
ently rated the STEP Essay Tests , the pretests in December and 
January, and the post-tests in May and June. 

Miss Arnold reached four conclusions; 

1 There is no assurance that intensive evaluation is any 
morreffe^ive than moderate evaluation in improving the 
quality of written composition. 

2. It must not be assumed that frequent practice is in 
itself a means of improving writing. 
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3. There is no evidence that any one combination of fre- 
quency of writing and intensity of evaluation is more 
effective than another. 

li. There is no indication that frequent writing and inten- 
sive evaluation are any more effective for one ability 
level than are infrequent writing and moderate evalua- 
tion (p. 62). 

In this study there was no significant difference between the 
sexes . 

The SCI investigators wonder whether graders might have 
evaluated more alike had they conferred on an occasional paper 
(Four correlations were in the .50 's, the others being .62 and 
. 76 ), and why, in a gains study, all themes were not scored at a 
single time with prethemes and post- themes mixed. A table show- 
ing comparisons of the terminal data only would also have been 
helpful. That is, how did the groups compare at the end regard- 
less of gains? 



Buxton, Earl W. “An Experiment to Test the Effects of Writing 
Frequency and Guided Practice upon Student's Skill in 
Written Expression," Unpublished Ph. D. dissertation, Stan- 
ford University, 1958. University Microfilms 58-3596. [As 
reported in Braddock, et al. Research in Written Composi- 
tion. Champaign, Illinois: WCHl, 1?637 PP» 58-70.] 

This experiment involved 257 students in the University of 
Alberta who were enrolled in a special "one-year 'emergency' 
course designed to train teachers for Alberta schools." All 257, 
who constituted the entire enrollment in the emergency program, 
carried the same courses (a "canned" schedule). The total group 
was divided into six classes: two control classes, in which 

students did no extra, out-of-class writing; two writing classes, 
in which students wrote a 500-word paper each week as an extra 
out-of-class assignment for a total of sixteen weeks; two 
revision classes, in which students did the same amount of 
writing on the same assignments as the writing classes. Writing 
classes were not required to write on the assigned topic and 
received only a brief paragraph of teacher comment at the end of 
each theme; there was no marking of errors nor commenting in the 
margin, and students were not asked to do anything with the 
papers after getting them back. The revision classes were 
required to write on the assigned topic and papers were marked in 
terms of unity, organization, logic, correctness, and such 
matters, with a general comment at the end. Students in the 
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revision classes were asked to correct and revise their papers in 
class on the day the papers were returned and discussed. The 
teacher was present to give aid. 

Criterion measures were two parts of an earlier edition of 

Cooperative English Tests : 'Mechanics of Expression" and 

"Effectiveness of E3qpression" (alternate forms before and after), 
and a theme. Each of two readers assigned a "content" score and 
an "error" score to each theme. The content score was based on 
fifteen factors with some factors weighted more than others. A 
maximum potential score was allotted for each factor. Each 
reader determined how much of that maximum to assign to that fac- 
tor in each paper. The error score was determined by counting 
errors in spelling, punctuation, or mechanics. The points 
assigned for each of the fifteen factors in a paper by each 
reader were addedj then the count for errors was subtracted from 
that. The scores for the two readers were averaged, and that 
mean was arbitrarily divided by three to get a usable scaled 
score. 

The results of Buxton's study show that the revision 
students--those whose papers were carefully marked and who were 
required to revise them— made a significantly greater gain in 
writing achievement as measured by the themes during the seven 
months of the study than did the writing students— those whc 
wrote the papers but did not revise them. There was a more sig- 
difference in gain scores between the revision students 
and the control students, who wrote none of the^unemesj this 
difference favored the revision students. Concomitant conclu- 
sions: theme ratings are reliable if the raters are thoroughly 

practiced in their system and frequently check on what they are 
doing, and (since there was no significant difference between the 
groups on the objective test scores) the theme ratings in this 
study measure something that the particular objective test used 
did not measure. 

It is not clear whether the division into groups took into 
account the balance of men and women. If, for example, the 
revision classes had more women than either of the other two 
groups, that could affect the results. 



Keys, Frank, Jr. "The Theme-a-Week Assumption: a Report of an 

Experiment," English Journal . $1 (May 1962), 320-22. 

This experiment dealt with varying the amount of writing and 
the amount of reading in high school English classes . Two 
classes in each of the four high school grades were "as closely 
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matched as was possible under the normal sectioning practices of 
the school.” The two classes in each grade were taught by the 
same teacher; one was designated as the writing class and the 
other as the reading class. Students in each writing class wrote 
a theme a week. AHer it was closely graded, the students cor- 
rected or rewrote it. Students in each reading class wrote a 
theme every three weeks, and spent one class day a week reading 
books of their own choice. Nothing is said concerning grading or 
rewriting of the reading -class papers. Evaluation instruments 
consisted of the STEP writing test and a theme, one of each 
administered at the beginning and at the end of the e^qperiment. 
The themes were evaluated by three ETS readers using a nine- 
point scale. 

The students in reading classes achieved a slightly greater 
improvement in writing scores than did those in writing classes . 
Generalizations arrived at by the investigator: 

1. Frequent writing practice probably yields greater div- 
idends in grade 12 than in grades 9 9 10, 11. 

2. Frequent writing practice probably yields greater div- 
idends with low groups than with middle or high groups . 

3. Frequent writing practice with low groups probably yields 
greater dividends within the area of content and organ- 
ization than within the area of mechanics or of diction 
and rhetoric. 

k. The claim that "the way to learn to write is to write” is 
not substantiated by this experiment. 

5. The claim that ability to write well is related to the 
amount of writing done is not substantiated by this 
experiment . 

6» For many students reading is a positive influence on 
writing ability. 

7. The influence of reading on the ability to write appears 
to be a separate factor, not directly related to the 
teacher's personality and enthusiasm (p» 322). 

It is not clear how the fourth generalization is supported 
by the experiment. Since all students in the experiment wrote 
themes, how can it be inferred that the data failed to support 
the notion that students learn to write by writing? Furthermore, 
Keys does not indicate whether the improvement mentioned was 
statistically significant . 
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Kincaid, Gerald L. "Some factors Affecting Variations in the 
Quality of Students' Writing." Unpublished Ed.D. disserta- 
tion (Michigan State University, 1953) • University Micro- 
films No. 5922. 

P This experiment attempted "to determine whether a single 
pajfer written on a given topic at a particular time [ italics" 
Kincaid's] can be considered as a representative sample of his 
[the student's] writing ability— and thus provide a valid basis 
for evaluating ability at any time in a writing course." It is 
of interest, not because it deals with a directly related pro- 
blem, but because it has implications for any study using theme 
readers to evaluate results . A group of 80 college students was 
divided into four subgroups, each of which wrote two papers in 
one two-hour session on the same day and another two papers in a 
similar session a week later. Three topics were used: Groups A 

and C wrote on topics 1 and 2 each time (both argumentative); 
groups B and D wrote on topics 1 and 3 each time (one argumenta- 
tive, one expository) . Groups A and B wrote each time without 
examination pressure (papers not counted toward grade); groups C 
and D wrote without pressure once, and with it the other time 
(papers counted on term grade the first time and not counted on 
term grade the second time). Papers were rated by three instruc- 
tors selected from the freshman staff, the rating being made on 
a ten-point scale (1 unsatisfactory, 10 superior) on each of five 
categories: grammatical conventions, sentence structure, diction 

organization, and content. The score for a paper could lie 
between 10 and 50; it was determined by computing the mean of the 
two closest ratings; if the two extreme ratings were equidistant 
from the middle rating or if the two closest ratings were more 
than five points apart, the mean of all three was used. 

Kincaid drew the following conclusions from this study: 

1. . . . the findings from thie study cast considerable 
doubt upon the justification of the customary practice 
of using five letter-grades to designate [individual] 
achievement in a writing course when a single paper pro- 
vides the basis for that designation (p. 97)# 

2. If an evaluation of over-all or average improvement is 
all that is desired, it can be obtained from a single 
sample of each student's writing for a pre-test and a 
post-test . . . (p. 99). 

3» • • • in order to develop a program for evaluating indi- 
vidual student improvement in writing (for strong as well 
as for weak students), it would be advisable to obtain 



several samples of writing by each student — samples of 
writing on different topics on the same day and on the 
same topics on different days. And such samples should 
be obtained for both the pre-test and the post-test 
(P. 99). 

Two matters impress the present investigators; 1) The theme 
topics used by Kincaid were simpler than those used in the State 
College of Iowa investigation. If more difficult topics had been 
used by Kincaid the results might have been different. 2) The 
findings of the Kincaid investigation support the use of group 
average scores on a single pretheme and a single post- theme. 



Kreisman^ Arthur^ et al. Pilot Study in English . Mimeographed 

report and diWoect summary of statistics . Ashland^ Oregon: 

Southern Oregon College^ 1963 (no pagination). 

This is the report of a pilot study designed "to investigate 
techniques and writing skills as a possible means of establishing 
the basis for a more extensive research program." It is inter- 
esting because the results led the Oregon investigators to aban- 
don further experimentation, and because one of those investiga- 
tors suggested a study like the State College of Iowa study. In 
the Oregon study, both college freshmen and high school students 
were involved. Control and experimental groups were matched at 
both levels; the college sWdents on the Verbal and Quantita- 
tive scores on S^, the total score on SCAT > and the sum of two 
ratings on the SlfeP Essay Test ; the 108 high school students on 
the score on the Callxo^ia^ ^^^ K of Mental Maturity and the sum 
of two ratings on Ihe llesi. Both conti^l and experi - 

mental students were in ca ch cla ssT The amount of siting 
actually done is not clear. In one place Kreisman says that the 
experimental students wrote three themes, the control students 
nine themes. He then says that the college- student experiment 
lasted for one term, the high school experiment lasted for the 
year. He says further that the experimentals wrote once a month, 
the con^ls once a week. Evaluation was based upon comparison 
of the feTEt* essay ratings at the beginning and at the end of the 
experiment^ 

There was no significant difference between the college 
experimental and control groups. The results for the high school 
groups varied. There was a significant improvement for the 
below-average high school students in the control group (more 
writing); there was a slight ( non-^signif leant ) drop in achieve- 
ment for the above-average students in the control group (more 
writing). There was no significant difference in the experimental 
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group (less writing). Dr. Cloer, the statistician, •wrote: ‘*It 

would appear that the principal beneficiaries of the experience 
in writing were those subjects of below-average ability or those 
who might be called 'under-achievers,' ...” 

Comments quoted from Kreisman: 

1. No adequate instrument for testing [composition] seems 
available . 

2. The difficulty of obtaining a sufficient number of 
students to make the experiment valid was one of the 
major obstacles, 

3. ... a purely quantitative experiment has little chance 
of being valid. 

l^. . , . one term of writing practice is not sufficient to 
form a foundation for judgment regarding the development 
of writing ability. 

^. . . . frequency may indeed be a factor in the development 
of writing ability, 

6. . . . all experiments of this nature are of no value and 
invalid on an a priori basis . 

In the light of the State College of Iowa study, the follow- 
ing additional comments are of special interest, the first by 
Kreisman, the second by Cloer, the statistician: "The emphasis 

that we thought might be fruitful [for future research] would be 
one which dealt idth student-teacher relationships or with matur- 
ation of students regardless of the courses they took,” and 
"Perhaps a better 'experimental group' would be one that did no 
writing (in English classes) over the experimental period.” 



McColly, William and Robert Remstad, Comparative Effectiveness 
of Con?)osition Skills Lea rning Activities in the Secondary 
S^ool [Cooperative Research Iroject i>ij8) • Madison: 
University of Wisconsin, 1963 • 

This study attempts to answer three questions: 

Does more writing alone result in better writing? 

Do more of "functional non-writing composition learning 
activities” (practical instruction: working with 
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student-written papers, emphasizing spelling, proof- 
reading, revision, etc.; group discussion; teacher 
evaluation and comment ) result in better writing? 

Does tutoring with immediate feedback (having the 
teacher present while the writing is being done and 
advising the student during the process ) result in 
better writing? (p. 18) 

To answer the first question, dealing with the effect of the 
quantity of writing on improvement in writing, the investigators 
used two classes in the eighth grade and two classes in the ninth 
grade. To answer the questions relating to "functional non- 
writing activities" and immediate feedback (tutoring), three 
classes in each of the tenth, eleventh, and twelfth grades were 
used. Covariance techniques and, to the extent possible, random 
selection of samples were empl(^ed. 

To explore the effect of the amount of writing on improve- 
ment in writing, control classes in the eighth and ninth grades 
wrote a theme a monih; experimental classes wrote a theme a week. 
All other class activities and assignments were the same. During 
the year, the eighth-grade control classes wrote 9 themes and the 
eighth-grade experimentals wrote 35 themes. The ninth-grade 
control classes wrote 6 themes, the experimentals . 3h> 

To study the effect of non- writing activities and tutoring, 
one control class (a monthly theme with functional instruction), 
and two experimental classes (weekly theme and functional 
instruction;, were organized at each grade level. About 9 
writing tasks with functional activities were completed in the 
control classes, about 3^ in the experimental classes. There 
were no individual conferences or "tutoring" activities in the 
first of these experimental classes in each grade. There were 
about 27 regular "tutoring" sessions in the second experimental 
class in each grade. Thus, a ratio of li-1 was maintaineci in 
writing tasks with functional activities between the experimental 
and control classes. 

Criterion and covariate measures for all students in the 
experiment included: SCAT (lA, IIA, IlIA), Nelson-Denny Reading, 

IT^ ( "Correctness and Appropriateness of Expression" and 
"Ability to Interpret Literature"), previous English GPA, over- 
all OPA, and writing samples, two written before the ejqjeriment 
and two written at the end. 

Based on this experiment, the answer to the first question 
is no. Results Indicated that increase in the amount of writing 
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by itself has no significant effect upon the writing proficiency 
of high school students. Again^ based on this experiment, the 
answer to the second question is af fir native j the answer to the 
third question is negative. Experimental classes with weekly 
theme and functional instruction Improved significantly compared 
to the control classes. The experimental classes with tutoring 
scored, at tne end of the experiment, aoout half way between the 
control classes and experimental classes without tutoring. 



Rohman, D. Gordon and Albert Wlecke . Pre«- writing ; The Construe- 
tion and Application of Models for Concept Forma tion in 
Writing (Cooperative ISesearch Project No. 2174^ East 
Lansing, Michigan: Michigan State University, 196k* 

This is one of the very few studies that have resulted in a 
statistically significant difference between control and experi * 
mental groups. Six sections of a college sophomore course in 
expository writing with an emphasis on pre-writing activities 
constituted the experimental group. Three sections were taught 
each quarter for iwo quarters . The rest of the students enrolled 
in the same course (11 sections in the Winter term, 10 in the 
Spring term), constituted the control group. The total number of 
students involved in the experiment is not disclosed. The exper- 
imental course contained six units; 1. The role of the writer . 

2. The escape from category (the concrete rather than the 
abstract). 3» The escape from clich^ (avoiding someone else's 
way or words), k* Dynamic relationship to the subject (an 
urgency to express what the writer has "discovered")* 5* Con- 
crete analogy (expressing one's "discovery" by con5>arison with 
something like it). 6. Refinement (finishing the essay). Three 
major techniques were used; keeping a journal, meditation, and 
use of analogy. The contr ol sections were taught as each teacher 
wished to teach them, with the exception that all instructors of 
the control sections assigned two 500-word themes on topics used 
in the experimental sections. These themes were used in the 
evaluation. 

Evaluation of the experiment involved four devices; 1. 
statements written by students in answer to the question: What 
did you like or dislike about the course?, 2. statements by the 
teachers who taught the course, 3* "objective" evaluation by 
readers who did not teach the course, and h* "subjective" evalua- 
tion by teachers who did not teach the course. No objective 
testing was reported. 

Evaluation by students was strongly favorable. Major items 
were that the course was enjoyed, that it developed freedom in 



writing and in the discipline of writing and thinking, that crit- 
icism of student writing led to involvement in the process of 
writing, that attitudes toward writing had changed (regarding, 
for instance, the relationship between thinking and writing), that 
the use of analogy led to greater concreteness and clarity. Neg- 
ative criticisms, which were relatively few, included the follow- 
ings the course was too short; it was too piecemeal; not enou^ 
grades were given; class criticism was too negative; the journal 
was an invasion of privacy; the use of analogy was mechanical. 

Instructors gave a number of reactions to the experiment, but 
their enthusiasm tended to center on three matters: the journal 

as a device to stimulate students to meditate about their exper- 
iences as well as to formulate their meditations in writing, the 
emphasis on the pre-writing process, and the freshness and sound- 
ness of the writing done. 

The essays for "objective” evaluation were selected from the 
total submitted by control and e^erimental subgroups on the two 
topics used by both subgroups. There were 226 experimental and 
li09 control essays evaluated. No information is given concerning 
how these essays were selected. Essays were judged on a four- 
point scale; U. superior, 3. above average, 2. below average, 1. 
incompetent. Three standards, unity, coherence, and emphasis, 
were guides for the readers. There were 11 readers, four high 
school teachers and seven college teachers. They worked in teams 
of 8, three who read at the first session not reading at the 
second, and three others substituting for them at the second. 

Each theme was read twice. About 85^ of the grades assigned were 
either the same for each theme or only one point different, 
indicating that the grading was relatively reliable . The results 
showed a statistically significant difference between the experi- 
mental and control groups in favor of the experimentals . 

Pour members of the English staff not involved in the exper- 
iment read the papers "subjectively." They were given a randomly 
selected sample of 50 experimental and $0 control themes . Rohman 
and Wlecke informed these readers concerning which set was wcoer- 
imental and which was control . Some investigators would not have 
done that. The readers were asked to answer a series of three 
questions: "Which set of essays seems to have more originality 

and in what ways? Generally, in idiich sot of essays does it seem 
more ijiqportant for the writers to express themselves and not be 
misunderstood? Which set of essays gives the greater sense of 
form?" (pp. 130-1) In addition, the readers were asked a series 
of specific questions concerning only the experimental essays, 
such as: "Do the techniques en^loyed in the experimental 

essays--the meditation in the *Loneliness * essays, and the analogy 
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in the 'Cbming of Age * essays— seem to provide a more coherent 
means for the instructor to gauge the success or failure of an 
essay?" All four readers gave the experimental group of essays 
the higher rating. 

Hohman and Wlecke leave so many questions unanswered that 
the report is difficult to interpret. How many students were in 
each sample? Were the students of the experimental sections sim- 
ilar in ability to those in the control sections? Did either 
sample have appreciably more women ihan the other? How were the 
themes that were evaluated selected? Do the 226 experimental 
themes represent a sairqpling conqparable to the 1*09 control^ Would 
a sanqpling of the control students have written as enihusiastl- 
cally of their course as the experimentals did? To what degree 
did the Hawthorne effect operated what implications has this 
study for composition programs generally? 



Sutton, Joseph T. and Eliot Allen. The Effect of Practice and 
Evaluation on Inqpr ovement in WrTCfen donQ>osItion. ( Coop- 
erative Research Project No. 1^9^). fi^nd, Florida: 
Stetson University, I 96 I*. 

This studty randomly divided college freshmen Into five 
^oups. The first two of these (Groups I and II) served as con - 
trols . During the period of the experiment, these two groups 
received no instruction in composition and wrote no papers excqjt 
the six criterion themes which provided the "before" performance 
and the six criterion themes which provided the "after" perform- 
ance# Group I wrote all twelve themes within a four- week period 
at the beginning of the semester. Group II wrote the first six 
criterion themes the first two weeks of the semester and the 
second six criterion themes the last two weeks of the semester# 
Groups III through V were the ex perimental groups, and all wrote 
six criterion themes the first"Two weeks and another six the 
last two weeks (as did Group II). In the ton- week interval 
between the writing of criterion themes. Group III wrote no 
papers but did evaluate four peer papers each week; Group IV 
wrote one theme each week which was evaluated by the members of 
Group III 5 and Group V wrote one class theme each week which was 
evaluated by a "professor." 

Five readers read each theme twice, once to rate it, once to 
rank it in an order of excellence relative to the other eleven 
themes by each writer. Rankings were based on five criteria 1 
ideas, mechanics, wording, form, and flavor, each one of which 
was scored on a five-point scale. A total for the six "before" 
themes for each student as graded by all five graders, divided by 



thirty (6 themes x 5 graders) gave an average score for each 
vrlter* The same was done for the six ’’after" themes^ and the 
averages were compared. 

Particularly in relation to the State College of Iowa study, 
Sutton and Allen’s enterprise is interesting. First, none of the 
students in any of the groups received direct instruction in 
con^osition. Such instruction as Groups IV and V received came 
from the marks and coirments on their papers, (hroup III gained 
experience in editing, though uninstructed in the procedure. 

Groups I and II had no experience whatsoever with composition 
except the twelve criterion themes. Thus, to a degree this study 
is similar to the present one in that no direct instruction in 
freshman composition was given and that some of the groups wrote 
only the criterion themes. It is different from the present 
study in that there was not a direct comparison between those 
completing a freshman program of writing instruction and others 
not in the ft'eshman English course at all. 

The results in the Sutton and Allen study showed an unusual 
inconsistency between the themes and the objective tests. In 
theme performance, the members of the five groups showed a sig- 
nificant decline during the experimental period. A decline was 
observed for the five groups combined and for each group sep- 
arately. This decline was, of course, unexpected. The authors, 
in speculating about its source, state: "Unfortunately, it 

appears that the very procedure necessary to secure such stability 
[among the theme performances] introduced other factors that may 
have had a deleterious influence on the results." The frequency 
of writing of test themes which were neither returned to the stu- 
dent nor commented on seems, in the opinion of Sutton and Allen, 
to have created an attitude of boredom and impatience among the 
students. On each of the two objective tests, the Cooperative 
English Tests ; English Expression and the College Entrance Exam - 
ination board English fest / the students showed significant 
improvemeni. ^his was zrxxe for the five groups combined, and 
there was no significant variation among the five groups in this 
respect . 



Wolf, Melvin H. Effect of Writing Frequency upon Proficiency in 
a College ^eshman English Course . (Cooperative Research 
’Project 2 0u6 ) , Amherst , Massachusetts ; University of Massa- 
chusetts, i960. 

This study involved six "regular" sections of college fresh- 
man composition and four remedial sections. Two of the regular 
sections, designated experimental-high frequency , wrote 39 themes 
in the school year; two sections , designated e^qperimental-low 
frequency wrote 8 themes in the year; two sedbions, designated 
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control 3 wrote 1^ themes in the year, the usual number in fresh-^ 
man composition at the University of Massachusetts. Two remedial 
sections, designated exper iment al-high fryuen^ , wrote 20 themes 
in one semester; the oiher iwo. designated control, wrote 8 
themes in one semester. These themes were carefully evaluated by 
the instructors and were revised and resubmitted by the students . 
The objective test used was Cooperative English Tests , Form . 
Six themes were used as tests: two written at the start, two at 

the end of the first semester, and two at the end of the second 
semester. The remedial students, being in the study only one 
semester, wrote only the first four test themes. Evaluation of 
the test themes was done by 10 instructors under the direction of 
an e:q>erienced instructor who had been a reader for the Educa- 
tional Testing Service. Wolf drew two conclusions; 1) writing 
proficiency did not improve with the increase in frequency of 
writing, 2) there was a high correlation between the scores on 
objective tests of grammar and mechanics and scores of themes as 
determined by the reading team. Since COOP has a section on 
mechanics and a section on effectiveness but usually yields a 
single score, it is not clear how the second conclusion was 
arrived at. 



Procedure 

The overall design of the pilot project involved selecting 
experimental and control groups at the State College of Iowa and 
testing them on four different occasions; the beginning of the 
freshman year (September, 1963), the end of the first semester 
(January, 196U), the end of the first year (May, I96U), and the 
end of the second year (May, 196$) . Members of the experimental 
group received no instruction in freshman composition; members of 
the control group did receive instruction in freshman composition. 
The performance of these groups was compared at each testing per- 
iod to determine whether the differences in their performance on 
the criterion measures were significant. Care was taken that the 
members of each group would be representative of the total fresh- 
man class entering the State College of Iowa in September, 1963* 
Members of both experimental and control groups pursued a normal 
academic program except that the experimentals omitted the fresh- 
man composition course. The experimental group substituted other 
general education courses for freshman composition and thus took 
some of those courses a semester earlier than the control group 
did, or carried a course in their majors earlier than most of the 
control students did. 
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Evaluative Instruments 



Three tests of performance in composition vere used: the 

Cooperative English Tests ; English Expression (COOP) , the 
College Entranc e Examination Board English Composition Test 
T®!5T7 and a iheme. !rhe first two are objective tests, tfhe 
COOP appealed to the investigators because it had been employed 
in previous research at the State College of Iowa and seemed to 
serve as a reasonably satisfactory indirect measure of student 
writing ability. The CEEB, unlike the COOP, is a '’secure" test. 
It is changed from administration to administration and a serious 
attempt is made to assure that students will have no prior access 
to any of the test items. It was included in part because of its 
greater security and in part because of a high correlation which 
had on one occasion been secured between performance on it and 
evaluations of writing samples Following is a list of 

the specific test forms employed on the successive testing occa- 
sions ; 



Testing Date 


COOP 


CEEB 


yepti., i 90 j 


TT 


ror 


January, 196U 


IB 


KPL2 


May, I96U 


lA 


KPLl 


May, 1965 


U 


HB02 



The COOP contains 90 items— 30 on Effectiveness and 60 on Mechan- 
ics. Total time limit is i^O minutes. The CEEB contains from 100 
to 110 items and has a total working time of 60 minutes— 20 min- 
utes recommended for each of three sections. From test form to 
test form the elements tested by the CEEB vary somewhat. Repre- 
sentative elements include paragraph organization, construction 
shifts, sentence correctness, and usage. The various forms of 
the test are regarded as equivalent but not parallel. 

The theme was a paper written within a two-hour period on a 
single topic provided by the investigators. Students were urged 
to remain for the full two-hour period, though they were allowed 
to leave after an hour and twenty minutes. An explanation of the 
method for selecting topics, a theme instruction sheet, and the 
topics used on the various testing dates are included as Appendix 
A. 



Establishing Matched Pairs 



For comparing the performance of the two subgroups the 
investigators used matched pairs instead of the analysis of 
covariance technique. A discussion of the pros and cons of using 
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the matched pairs approach may be found in Appendix B. The pro- 
cedure worked out in the following manner • 

Enrollment practices at the State College of Iowa made it 
necessary to select the members of the experimental subgroup— the 
students who would not enroll in freshman composition--before the 
begiming of the fall^ 1963, registration. To accomplish this, 
the investigators selected an experimental pool of 32 $ students. 
Consultation with the Registrar indicated that most students who 
enroll in September have been accepted by July 1. He provided 
the investigators with a list of the names of these students as 
of July 1, 1963 . The investigators separated the members of this 
poup by sex, and within each sex, ranked the students from high 
to low in terms of performance, as indicated by standard score on 
the English section of the American Allege Testing Program (ACT). 
The goal was to select a group which would include approximately 
one-third of the entering fjreshman class, would contain a ratio 
between men and women representative of the total freshman class, 
and would reflect the range of performance of that class on the 
English section of the ACT. The total number of students idio had 
applied by July 1, I 963 , was 929 (361 male, 39^-and 568 female, 
61^). The experimental pool of 325 (38^ male and 62Jg female) 
constituted about thirty-five per cent of the total group. 

To obtain this group of 325, the investigators assigned a 
three-digit number to each of the names on the Registrar's list. 

^y use of a table of random numbers, they then selected thirty- 
five per cent of the students of each sex at each score level. 

The resulting list was screened by the Registrar to eliminate 
those whose college programs would terminate before the end of 
the two-year period of the experiment or who were planning to 
attend college only part-time. Any student eliminated by this 
screening procedure was replaced on a random basis by another 
student of the same sex and ACT score. The 325 students thus 
identified constituted the experimental pool. 

The Dean of Instruction mailed a personal letter to all mem- 
bers of this pool informing them that because they had been 
selected for a special investigation they should not enroll in 
freshman composition during their first two years of college. He 
also Invited them to write to the investigators if they had any 
questions about their participation or if they desired further 
information. It was esqpected that this letter would encourage 
the students to cooperate. Very few students made inquiries and 
none asked to be taken out of the group. Consequently members of 
the experimental subgroup were not volunteers; they were in fact 
selected by the investigators according to the procedure outlined 
here. 



During the orientation period ^hich preceded the beginning 
of instruction in the fall^ 1963> all entering students were 
given tests which included the I^oject instruments. After the 
scores on these tests became available^ members of the experi- 
mental pool were paired with students enrolled in the freshman 
composition sequence. Students were paired on the basis of sex^ 
theme score^ age^ and a score representing combined performance 
on the CEEB and COOP. 

The matching process may be illustrated from actual data for 
three pairs of students. The numerals represent ^ in order ^ the 
student's sex (1 for male^ 2 for female)^ total theme score (sum 
of two ratings)^ age in years ^ and combined objective test score. 



Subgroup 



Total 

Sex Theme Score 



Sum of Two 

Age Objective Test Scores 



Experimental 


1 


3 


18 


77 


Control 


1 


3 


18 


76 


Experimental 


1 


12 


18 


IbO 


Control 


1 


12 


17 


139 


Experimental 


2 


7 


17 


101 


Control 


2 


7 


17 


103 



No students were matched unless they were of the same sex, had 
the same total theme score^ were within one year of each other in 
age, and were within three points of one another on combined 
objective test scores. 



The combining of the scores of the two objective tests was 
accomplished by using the CEEB Standard Rating and the COOP Con- 
verted Score^ transforming each into a new standard score on a 
scale having a mean of 50 and a standard deviation of 10^ and 
adding the two resulting transformed scores. V/henever more than 
one potential control student qualified as a suitable match for a 
given member of the experimental pool^ selection was by a random 
procedure. The ratio between the number of students in the 
experimental pool and the number in the control pool was approx- 
imately one to three. 



Theme Evaluation 

Themes were evaluated by teams selected by Fred Oodshalk^ 
Chairman of Test Development in the Humanities at the Educational 
Testing Service 9 from the pool of readers used by the Educational 



Testing Service in its theme-reading program. These teams vere 
used because of their wide experience with theme reading and 
because many of the same readers would be used on successive 
scoring occasions. 

The ETS readers were accustomed to a i^-point scale. The SCI 
investigators preferred a 9-point scale. The goal was to employ 
a scoring scale which would permit the separation of the themes 
into a reasonable number of quality levels without presenting the 
evaluators with so many rating categories that undue time would 
be consumed in pondering fine distinctions . A compromise was 
adopted: a 9-point scale (1 to 9) with emphasis on 2, h, 6, and 

8 . 



When Mr . Godshalk communicated his standards to the readers , 
they were asked to think of the normal curve as split in the 
middle^ with each segment so created split again halfway between 
the median and the extreme. This created four categories: much 
below average^ below average; above average^ much above average. 
It did not provide specifically for the average rank. Readers^ 
already accustomed to the four-point scale ^ found it easy to use 
2, k, 6, and 8 as their main grades^ but they were able also to 
use the odd numbers whenever it seemed that a particular paper 
had some characteristic requiring a grade between two of the even 
numbers. Since each paper was read by two readers and the 
ratings summed^ the total possible range of scores for a single 
paper was from 2 to 18. An explanation of the reading procedure 
is given in Appendix C. 

It is recognized that the validity of these evaluations 
depends upon the degree to which Mr. Godshalk 's judgment of 
student writing, as modified by discussion with the readers, is 
sound. Mr. Godshalk has an unusually wide background in eval- 
uating the writing of college-bound high school seniors (3). 

The readers were from a variety of geographical backgrounds and 
a wide range of educational institutions. Mr. Godshalk has for 
years supervised groups of readers like these; the readers have 
worked together as teams in just such reading situations. Though 
neither Mr. Cowley nor Mr. Jewell consistently compared their 
evaluation of sample themes with that of the groups, when they 
did, there was no pronounced disparity between their ratings and 
those of the readers. In the judgment of the investigators, the 
validity of theme evaluations is as high as it is possible to 
achieve in a project of this sort. 



RESULTS 



September j 1963 » Test Performance of Original and Persisting Sub- 
groups - - ~ "" 

Table I contains basic information regarding the entire 
entering freshman class at the State College of Iowa for the 
academic year I 963 - 6 U. None of the information in the table 
involves student performance after September, 1963. The first 
line of the table shows the performance for the essentially com- 
plete class of new freshmen (N«910) on seven measures. Each 
successive line of the table represents a specified subgroup of 
the total group of 9 IO. The data in the table thus permit an 
examination of the extent to which the persisting experimental 
and control subgroups, composed of matched pairs of students, 
remain representative of the parent group. 

Line two is of interest as it reveals the extent to which 
the representativeness of the samples originally identified in 
the summer of 1963 was retained after the actual enrollment of 
students in September, I 963 . Of the 32$ individuals originally 
drawn for the experimental subgroup in July, I 963 , 2Bk matricu- 
lated. Comparison of lines one and two suggests that the basic 
data for the 28li members of the experimental pool agreed closely 
with the data for the total freshman class. This close agreement 
is noted on each of the seven measures. For exan 5 )le, the Cooper- 
Qtive English Tests ; English Expression (COOP) converted score 
mean was l6o.liiJ for the experimenial pool and 160.09 for the 
total class. Thus the goal of the investigators— to select an 
experimental pool which would be representative of the actual 
entering freshman class— was achieved. The only aspect in which 
a noticeable difference exists between the experimental pool and 
the total class is in the slightly smaller percentage of males 
found in the experimental pool. 

Line three contains the data for the 210 members of the 
experimental pool who were paired with control students. Line 
four shows the information for the 210 control students. It will 
be noted from lines three and four that the experimentals and the 
controls, as subgroups, wore closely matched. On the COOP the 
means were 160,7$ and I 6 O. 9 S respectively. The variability was 
similarly close; S. D. 's were 7.99 and 7 .63. The means can be 
conqpared to the moan of I 60.09 for the total freshman class dis- 
played in line one. 

Lines 5 and 6 of Table I present the September, I 963 , infor- 
mation for complete sets of matched pairs who finished the fall 
semester I 963 - 6 U with all data available. Again using the COOP 
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as an example^ it i«ill be noted that the experimental subgroup 
mean vas 161.28 and the control subgroup mean was l6l.U3. The 
slight selectivity among persisting students as compared to the 
entering students is seen here; the two subgroups which finished 
a full semester of college obtained OOOP means in September vhlch 
were slightly higher than the means of I60.09 of the total 
entering freshman class. 

The table shows that at the end of the first full academic 
year complete data were available for 113 matched pairs of stu- 
dents. The continuing representativeness is reflected by the 
fact that these 113 pairs had, in September, 1963, means of 
161. U8 for the experiment als and I61.96 for the controls on the 
OOOP as compared with the mean of the total freshman class (Sep- 
tember, 1963) of I6O.09. 

At the end of two complete academic years, the number of 
remaining matched pairs was 31. Their means on the OOOP in Sep- 
tember, 1963, were 162.81 and 162.65. It will be noted that 
these means are appreciably higher than the mean of 160.09 for 
the parent group of 910. The factor of selectivity is thus 
apparent in the somewhat higher means these students achieved in 
the fall, 1963^ testing. It must be remembered that the test 
performances reported in the table are performances at the begin- 
ning of the freshman college year, 1963 . 



Criterion Scores — September, 1963s through May, 1965 

Criterion testing was done at three times — end of first 
semester, end of second semester, and end of fourth semester. 

The numbers of matched pairs were, respectively, I66, 113, and 
31. Basic comparisons of test performance for each of these 
three sets of esqperimental and control students are given: 
within subgroups between beginning and final means, and between 
subgroups on ending means only. The appropriate means, standard 
deviations, r's, and t*s will be displayed in the tables. 

Table II deals primarily with the criterion tests: the 

OOOP, the CEEB, and a theme. This table presents a compact 
picture of performance in terms of means and standard deviations 
on the various criterion facts on the various testing occasions. 
(The key comparisons and analyses of the data are shown in Tables 
III-XI.) 

V/hereas in Table I all test scores were those available in 
September, I963, Table II presents the performance of persisting 
matched pairs at four successive testing periods beginning with 
September, I963. Examination of this table will reveal the 
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Control 31 " 168.06 5.97 551.13 81.61 10.19 2.10 



criterion measures for the two 
subgroups at the beginning of the fall semester, I 963 - 6 I*! at the 

ter °196%^! of the sp^ng semes- 

T»vi«* and at the end of the spring semester, 196^-65. 

students did not receive instruction similar to 
rn«+r.fi f ft’es^an composition (English I, English II). The 
thft^w Therefore, the data in this table permit 

anr© ? Comparisons of the project: those between the perform- 
experimental and control subgroups on the criterion 
measures at successive points in their college careers. 




nf ^he data on the performance on the COOP 

ter^^TaiH^rv^^ioA? available at the end of the fall semes- 

chantrA i 7 ?+vi?i types of con^arisons are presented: the 

comers subgroup from September to January and the 

omparison between subgroups on the January performance. 



l/^*J ®^®*‘^®“^al subgroup the test mean in January was 

in^t ^®®^ *”®®" September. The result- 

rhf«« ^ (df-l 6 S) was significant, suggesting that the 
change in means was greater than could be attributed to chance 
similar analysis for the control subgroup shows that 

(WawS^lO^Sr^” significant 



^®*^ the e^qperimental subgroup is compared with the control 
^0 test performance, the means differed by only 

ThA*hxJ!!+w t-ratio, I. 15 , fell short of significance. 

The hypothesis of equal performance after an equal length of col- 

thfnSSS®A+®fu® thus sustained in regard to performance on 
the COOP at the end of the first semester. 

4 ” ^®^^® TV show the performance on the CEEB of the 

m presented in Table 

III, the data in Table IV permit an analysis of the change within 

^P*®“*’er to January and a oonyarison of the 
Jsnudx^ po 2 *f ozmsncos of ^ho trwo subgproups • 

m subgroup, the January mean was 1.43 

higher than the September mean (496.12 minus 496.69). This mean 

trol^Lhwom^h^"^^^®®"^ ***® t-ratio was .25. For the oon- 

?t Luaif SS®" ’'®J "«® significant 

aL ***® eontrol students advanced more than 

did the experimental students on CKEB during the fall semester. 
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•^Significant at .05 level (two-tailed test)jt of 1.98 or higher recjuired for significance at .Q5 level. 
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An examination of the January test scores reveals that the 
mean for the control subgroup is 15 *56 points higjher than the 
mean for the experimental subgroup. This difference is signifi- 
cant (t«2.69)« This is consistent -with the data presented for 
the within-subgroup gains. 

The data for theme performance are presented somewhat dif- 
ferently from the manner in which the objective test data are 
reported. Braddock et al. suggest^ and the investigators agree, 
that theme evaluations can be considered comparable only when 
three conditions are met: the evaluations are all made on the 

same occasion, the evaluators are ignorant of the time of 
writing, and the evaluators do not know which papers were written 
by the e^qperimental students and which were written by the con- 
trol students (1:10-11). Inasmuch as the experimental procedures 
required September themes to be evaluated in October, 1963, and 
the January, 196U, themes to be evaluated in Hay, I 96 U, the Sep- 
tember and January ratings are not comparable. Consequently, 
only January, I 96 U, performance was analyzed. As Table V 
shows, the mean for the 162 experimental students is 9*15 and the 
mean for the I 6 U control students is 9*20. The difference of .05 
is so small -that it could easily be attributed to chance; i.e. 
the difference is not significant (t».22). “ 



Comparison of Criterion Scores— Sample Available May, I 96 U 

V/hereas Tables III, IV, and V were concerned with the first 
semester of college. Tables VI, VII, and VIII present evidence 
for the first and the second semesters. At the end of the second 
semester, full data were available for 113 of the I 66 matched 
pairs for whom full data were available at the end of the fall 
semester . 

Table VI contains data obtained from the administration of 
the CCX)P on three occasions: September, 1963; January, I 96 I*; and 

May, 196 ^. The first of the three parts of the table shows the 
facts for September, I 963 , and January, I 96 U. As with Tables III 
and IV, the difference between the September and January perform- 
ances within each subgroup and the difference in January perform- 
ance between the two subgroups are shown. 

The change in mean during the first semester was 3*6U for the 
experimental subgroup and 3*89 for the control subgroup. For 
each subgroup, the mean gain was significant: t-ratios were 

6.75 and 7*92. A coitqparison of the January means for the two 
subgroups revealed an advantage of .73 for the controls. This 
mean difference at the end of the semester was not significant 

(t-1.25). 
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It is useful at this point to consider certain aspects of 
significant difference in this setting. In the preceding para- 
graph it was noted that during the first semester the OOOP mean 
gain scores were significant^ whereas the January COOP scores of 
the experimental subgroup and the Control subgroup did not differ 
significantly. Specifically^ a difference of .73 was not signif- 
icant. What kind of a difference between the two subgroups on 
the January COOP testing would have been large enough to be sig- 
nificant? This can be readily estimated. The obtained t value 
was 1.25; the value needed for significance was 1.98, 
obtained t was 63 % as large as the needed t. It follows that the 
obtained mean difference was 63 % as large as the mean difference 
needed for significance. Therefore, if instead of the actual 
difference of .73 the obtained mean difference had been 1.16, the 
difference would have been significant at the »05 level. What 
does this mean in terms of performance on the OOOP test? If on 
this 90- item test each student in one of the subgroups had given 
one or two more correct answers than did his counterpart in the 
other subgroup, and if standard deviations and correlations 
remained about the same as those reported in Table VI, a signifi- 
cant difference would have occurred. 

The second section of the table gives data for the secord 
semester— the period between January, 196U, and May, 1964» Th® 
experimental subgroup had a mean gain in this period of .96, and 
this was not significant (t«1.88). The control subgroup showed 
a slight decrease in mean test score, .1*3; this decrease was not 
significant (t*.8U). On the May, 196U, testing the experimental 
subgroup mean was .66 higher than the control subgroup mean. 

This was not a significant difference (t*.88; df*112). 

During the first semester, then, both subgroups made a sig- 
nificant improvement in performance on the OOOP; during the 
second semester neither subgroup did. 

The third section of Table VI presents the performance of 
the two subgroups at the beginning of the first semester and at 
the end of the second semester, 1963-6U. Over this nine-month 
period, the experimental subgroup showed an increase in mean test 
score from 161.U8 to 166.08. This mean gain of U*60 was signifi- 
cant (t*8.36). The control subgroup advanced in test mean from 
161.96 to 165.1^2. This mean gain of 3.U6 was also significant 
(t«5.7it). For each subgroup, the significant improvement during 
the year was actually achieved during the first semester . Appar- 
ently the experience reflected in the observed change in test 
scores occurred during the first semester, and no experience 
during the second semester resu3-ted in a significant additional 

change . 



In the analyses stemming from Table VI, correlations between 
sets of test scores are utilized. It may be noted that repeated 
testing of the same individuals showed r's of the order of .70. 

The correlations across matched pairs in May were approximately 
.50. At the beginning of the experiment the correlation across 
matched pairs was .76, The time interval between the testings 
was nine months . The correlation data tend to confirm the idea 
that the original matching was reasonably satisfactory. 

Table VII, based upon the CEEB, is similar to Table VI, 
which dealt with the COOP. The facts for the first semester are 
in the upper section of Table VII. For the experimental subgroup, 
the September, I963, to January, I96I4, mean change was minus 
2.^3— a slight decline from $03.62 to $01.19. The control sub- 
group showed a change in mean from 1*98.17 (September) to 517*77 
(January). The increase of 19*60 was significant (t*3 *54). The 
January means for the two subgroups differ by 16. $8, and the t- 
value of 2.36 indicates a significant advantage in favor of the 
control subgroup. 

The January and May CEEB test data appear in the middle sec- 
tion of Table VII. It is noteworthy that, for the change scores, 
the findings are a reversal of those just examined for the first 
semester. During the second semester, the experimental subgroup 
improved significantly, whereas the control subgroup did not. 

The respective changes in mean test scores were 35*97 for the 
experimental subgroup and 6.69 for the control subgroup. In the 
comparison of the May test score means, the experimental subgroup 
was 12.70 points higher than the control subgroup. This differ- 
ential was not quite significant (t»1.65| 1*98 required to 
achieve significance). For the second semester, then, the exper- 
imental subgroup started with a lower mean test rating than the 
control subgroup and finished with a higher mean test rating. 

By using the upper and middle sections of Table VII, it is 
possible to explore some of the requirements for a significant 
difference on the CEEB. Let ws use as an example the difference 
in means— control minus experimental— for January, I964, 

May, I96I*. The difference between subgroups in January, 16. 5o, 
was significant $ the difference in May, 12.70, was not signifi- 
cant. Thus, in terms of the standard deviations and correlations 
involved, a between-subgroup mean difference of about I4 or 15 
was required for significance at the .0$ level. For the CEEB 
test an increase of one raw score point— one more question 
right— is typically associated with an increase of about six 
standard rating points. Thus if each member of one of the sub- 
groups had had two or three more correct responses than did his 
counterpart in the other subgroup, the resulting subgroup means 
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^ignificjant at .05 level (two-tailed test); t-ratio of 1.98 or higher required for significance. 



would have differed significantly. The number of items on the 
several forms of the CEEB ranges from 100 to 110. 

The lower section of Table VII compares the September per- 
formance with the May performance. Both subgroups experienced a 
significant gain in mean test performance. The mean change for 
the experimental subgroup was end for the control sub- 

group, 26.29. The associated t-ratios were g.?? and i^.lU. This 
significant increment during the academic year took place during 
the first semester for the control subgroup and during the second 
semester for the experimental subgroup. These results mark an 
interesting contrast to the results on the COOP, in ^ich both 
subgroups made a significant gain during the first semester, and 
no significant additional gain during the second semester. It 
should be remembered that the experimental students did not 
receive instruction in fteshman composition during either semes- 
ter, whereas the control students had such instruction both sem- 
esters . 

Table VIII contains the results of the administration of a 
theme in May, 1961*. The theme rating for each student was the 
sum of two independent evaluations. It will be seen that the 
means for the experimental subgroup and the control subgroup are 
quite similar: 9,79 and 9.56. The difference of .23 may be 
interpreted in terms of a t-ratio of .86, which is not signif- 
icant (t of 1.98 required at .oS level). 

It is appropriate to attempt some interpretation of the fact 
that the obtained t value of .86 was not significant, but one of 
1.96 would have been. Since a t value 2.29 times as large as the 
obtained one was required for significance, the mean difference 
would have had to be 2.29 times as large. That is, for signif- 
icance, the between subgroups mean difference would have needed 
to be .53 instead of the .23 obtained. Thus, if, on the theme in 
May, I96U, (about) members of one subgroup had the same theme 
score as their counterparts in the other subgroup, but the other 
55 members of the one subgroup scored one point higher than their 
paired counterparts, the resulting subgroup means would have dif- 
fered significantly. This illustrative analysis is in terms of a 
level of confidence of .oS and of standard deviations and correla- 
tions similar to those in Table VIII. The possible range of 
Total Theme score was from 2 to 18. 

It is interesting to note the theme ratings of these same 
110 matched pairs at the end of the first semester. The papers 
written in January and in May were read at the same time by the 
same team of readers; the readers did not know tdiich papers were 
written by the experimental and which by the control students. 
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*°P^® "®®'^ January and which 

in May. In the li^t of these conditions, we can in this 

ThrTAn«a^”^?S?i performance at different testing occasions 
The January, I 964 , results were as follows: 



N 



Mean S.D. 



110 Experimental 9 . 2 ^ 2.1*0 



112 Control 



9.37 2.31 



.23 



The January means of 9.25 and 9.37 did not differ signifi- 

Se e^rlm«n+ ^ during the second semester for 

^e e^ertoental subgroup was .51*. The January-May correlation 

^ t-ratio of 1,80 was not significant. The gain 

Taniia^» sub^oup during the second semester was ,19, The 

il^t SScaTS-m)”'' *“® ’'®® 

Compariso n of Criterion Scores— Sample Available May. 196 ^ 

Tables IX, X, and XI show the data through the first two 
years of college— the total interval covered in the present 
study.* Th^ present evidence from four testing occasions for 

students for whom full data were avail- 

student performance on the COOP. The 
first set of facts in Table IX is for the first semester of the 

^b™ hTa^ ®?*®"5®l; ’i^^3-January, I 96 I*). The experimental 

semester, and this was sig- 

?19 ih*. wntrol sub^oup showed a mean gain of 

ubt>r’i*Ui si^ificant (t«7.06). The January means 

were 1^.77 and 167.77. This difference of 3,00 in favor of the 
control subgroup was significant (t* 2 . 65 ). 

Durii^ the second semester of the freshman year, the experi- 
H?)} significant gain, while the control sub- 

P'® data appear in the second section of Table 
lA. The change in mean test performance was 3.0i* (t*3.1^) for 
the experimental students and .58 (t-. 8 l) for the control students. 

^Research J^oject 3177, an extension of the present investigation 

Resting the entire senior class in the spring of ^ 

students who have been involved in the present 
study will be among those tested at that time. 
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TABLE IX 

THE PERFORMANCE OF 31 MATCHED PAIRS OF STUDENTS ON THE COOPERATIVE 
ENGLISH TESTS: ENGLISH EXPRESSION AT THE BEGINNING OF THE FIRST SEMES- 
TER, AT THE END OF THE FIRST SEMESTER, AT THE END OF THE SECOND SEMESTER, 
AND AT THE END OF THE FOURTH SEMESTER; DIFFERENCES IN MEANS, AND t-RATIOS 

Dif f . In Difference 

Cooperative English Means; t- In Jan. t- 

Converted Scores Jan. Ratio Means; Ratio 



Sub- i>epi. 196^ Jan. l^6k Minus (df« Control (df» 

group N Mean S .TT Miean S .b. Sept . r 30) Minus Exp, r 30) 



1^.1 Af iUCail w.JJe nodXA Q.4^. 

^Sai 7*13 

Control 31 162.65 5.81* 167.77 5.56 


1.96 

5.12 


.72 2 . 11 ** 
.75 7.06* 


A*4JLlAUO . A / 

+ 3.00 .53 2.65* 






Diff.In 




Difference 






Cooperative English 


Means; 


t- 


In May 


t- 




Converted Scores 


May 


Ratio 


Means: 


Ratio 


Sub- 


Jan. l9bU May 19 OU 


Minus 


(df- 


Control 


(df- 


group 


N Mean S.li. Mean S.D. 


Jan. 


r 30 ) 


Minus Exp. r 


30 ) 


Experi- 

mental 


31 161*.77 7.13 167.81 7.25 


3.01; 


.72 3 . 15 * 


+. 51 * . 1*2 


.1*2 


Control 31 167.77 5.56 168.35 5.70 


.58 


.75 .81 







Diff.In Difference 

Cooperative English Means ; t- In May t- 

Converted Scores May Ratio Means; Ratio 

Sub- Sept. May 1^64 Minus (df* Control (df* 

tqroup N^ Mean Mean bTPT Sept. r 30) Minus Exp, r 30) 

31 162.81 6 . 3 k 167.81 1 . 2 $ $.00 .61 h . 60 ^ 

mental .^2 42 

Control 31 162.6$ 5 . 8 U 168.35 $.10 $.10 .ih 7.62* 





Diff.In 


Difference 






Cooperative English Means ; 


t- In May '65 


t- 




Converted Scores May 


Ratio Means : 


Ratio 


Sub- 


May i 904 May 1965^ Minus 


(df* Control 


(df- 


group 


N Mean S.D. Mean £^.b. May 


r 30 ) Minus Exp. 


r , 30 ) 



31 167.81 7.25 168.29 lM$ .1|8 .12 .h9 
mental 23 .W .18 

Control 31 168.35 5.70 168.06 5.97 -.29 .76 .UO 



Diff.In Difference 

Cooperative English Means; t- In May t- 

Converted Scores May Ratio Means; Ratio 

Sub- Sepi. 1^63 May 1965 Minus (df- Control (df* 

group ^ "Wan O'.' "Mean 5.DT Sept, r 30) Minus Exp, r 30) 

31 162.81 6 . 3 U 168.29 7 . 1*5 5 . 1*8 .75 6 . 12 * 

mental .23 .147 #18 

Control 31 162.65 5.81* I 68.06 5.97 5.1*1 .72 6.81* 



*t-ratlo of 2 . 0 I* OP higher required for significance at .05 level (tvo- 

tailed test) 
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The effect of the gain of the control subgroup during the first 
semester and of the experimental subgroup during the second 
semester was that by May, I 96 U 9 the means of 167.81 and 168.35 
did not differ significantly (t”.U2). 

The third section of Table IX covers the entire freshman 
year (September, 1963-May, 196i|). During the nine months, the 
experimental subgroup mean increased from 162.81 to 167.81. The 
corresponding increase for the control subgroup was from 162.65 
to 168 . 35 . The gains, 5.00 and 5.70 respectively, are both sig- 
nificant. The similarity between the two subgroups at the begin- 
ning and at the end of the academic year is noteworthy. 

The test results of May, 196i^, and May, 1965, are presented 
in section four of Table IX* The end-of-year performances as 
freshmen and sophomores were strikingly similar: for the esqper- 

imental subgroup, 167.81 and 168.29; for the control subgroup, 
168.35 and 168.06. The small changes were, of course, not sig- 
nificant, The between- subgroup difference in May, 1965, was .23 
(not significant— t*. 18) . 

The final section of Table IX contains test data obtained in 
September, 1963, and in May, 1965. Over the two academic years 
both the experimental subgroup and the control subgroup gained 
significantly. The mean gains were 5*1^8 for the experimentals, 
and 5 . ill for the controls. 

The notable fact which may be derived from Table IX is that 
the students* performance on COOP did not improve during the 
second year of the interval (May, I961|-May, 1965). For the con- 
trol subgroup, the significant in^rovement in performance occurred 
during the first semester of the first year; for the experimental 
subgroup the significant Improvement came during the second sem- 
ester of the first year. A plausible explanation of these data 
would be that the performance of the control group in January, 
196ii, is the result of Instruction while the performance of the 
experimental group in May, 196!|, is the result of maturation. 
Further, one more year of maturation had no visible effect on 
either group's performance on COOP. 

Table X covers the same span of time and performances as 
does Table IX— the beginning, middle, and end of the freshman 
year and the end of the sophomore year. The test scores were 
obtained from administrations of the GEEB. For the first semes- 
ter of the freshman year, the experimental subgroup had an 
initial mean of 515*45 and an ending mean of 519.13* ^is gain 
of 3.68 was not significant (t».33; df»30). The control subgroup 
advanced from a Septeinber mean of 515.90 to a January mean of 
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TAB1£ X 

THE BERFOBMANCiS OF 31 MATCHED PAIRS OF STUDEKTS OM THE COtLE® EHIRANCE 
EXAMIHATIOM BOARD EMQLISH COMPOSITION TEST AT THE BEGINNING OF THE FIRST SEJi- 
ESTIiR, AT THE EIO) OF THE FIRST SEHESTETt, AT THE END OF THE SECOND SEMESTER, 
AND AT THE END OF THE FOURTH SEMESTERj DIFFERENCE IN MEANS, AND t-RAHOS 







College Entrance 


Diff.In 




Difference 










Examination Board 


Means : 


t- 


In Jan. 




t- 


Sub- 




Standard Rating 


Jan. 


Ratio 


Means : 




Ratio 




“^leptV ijbi ^an. 19^4 


Minus 


(df* 


Control 




(df« 


group 


N 


~fiean ti.D. Mean S,D. 


Sept. 


,r 30) 


Minus Exp. 


r 


30) 


Experi- 

mental 


31 515.1*5 66.67 519.13 72.21 


3,68 


.61 .33 






1.61* 










+18.77 


.67 


Control 31 515.90 61;.8l4 537*90 62.6k 


22.00 


.71* 2.20» 












College Entrance 


Diff.In 




Difference 










Examination Board 


Means: 


t- 


In May 




t- 


Sub- 




Standard Rating 


May 


Ratio 


Means : 




Ratlo 




Jan. 1^6k May i$e>U 


Minus 


(df* 


Control 




(df» 


group 


N 


^ean S.B. Mean S.D. 


Jan. 


r 30) 


Minus Exp. 


tiU 


30) 


Experi- 

mental 


31 519.13 72.21 558.87 7l*.6o 


39.71* 


.71* 1*.18» 


-1.03 


.62 


.08 


Control 31 537-SO 82,61* 557.81* 82.1*3 


19.91* 


.61 1.52 












College Entrance 


Diff.In 




Difference 










Examination Board 


Means : 


t- 


In May 




t- 






Standard Rating 


May 


Ratio 


Means: 




Ratio 


Sub- 




Sepli. May Ij61i 


Minus 


(df= 


Control 




(df« 


isroup 


N 


~&ean S.D. hean S.D. 


Sept. 


r 30) 


Minus Exp. 


r 


30) 


Experi- 

mental 


31 515 . 1*5 66.67 556.87 7 i*. 6 o 


1*3. 1*2 


.75 1*.79* 




.62 












-1.03 


.08 


Control 31 515.90 61*. 81* 557.81* 82.1*3 


1*1.91* 


.62 3.53* 












College Entrance 


Diff.In 




Difference 










Examination Board 


Means: 


t- 


In May *65 




t- 






Standard Rating 


May 


Ratio 


Means : 




Ratio 


Sub- 




May 1961* May ' 1965 ' 


Minus 


(df« 


Control 




(df* 


group 


N 


kean b.b. kean b.D. 


May 


r 30) 


Minus Exp. 


r 


30) 


Experi- 

mental 


31 558.87 7 l *,60 51*0,55 72.72 


-18.32 


.83 2.37* 


+10.58 


M 


.72 


Control 31 557.81* 82.1*3 551.13 81.61 


-6.71 


.65 .51* 









537.90. This 22-point gain was significant (t*2.20j df®30). The 
January, 196U, mean for the control subgroup was 18.77 higher 
than the mean for the experimental subgroup, and this difference 
was not significant (t«1.6U; df»30)« 

The evidence for the second semester of the freshman year 
appears in the second section of Table X. The experimental sub- 
group had a mean change of 39. 7U (558.87 minus 519.13); this was 
significant as revealed by a t-ratio of 1|.18. The control sub- 
group had a mean change of 19. 9U (557. 81i minus 537.90); this was 
not significant in terms of a t-ratio of 1.52. The May, 196^> 
test means of the two subgroups differed by only 1.03. The t- 
ratio was .06. 

Evidence over the total freshman year is presented in the 
third section of Table X. Significant gains between September 
and May are noted for both the experimental subgroup and the 
control subgroup: h3M2 (experimental) and Ul.9it (control). The 

corresponding t-values ware 1^.79 and 3.53. 

The last two sections of Table X include the test means for 
May, 1965, The fourth section shows that between May, 1961;, and 
May, 1965, both subgroups declined somewhat in their performance. 
The negative change of 18.32 for the experimental subgroup was 
significant (t«2.37; df*30). The negative change of 6. 71 for the 
control subgroup was not significant (t*.5i;; df*30). The May, 
1965, subgroup means were 5U0.55 for the experimental subgroup 
and 551.13 for the control subgroup. This difference of 10.58 
was not significant (t*.72; df*30). 

The final section of Table X covers a span of two academic 
years . When the September, I963, CEEB scores are compared with 
the May, 1965, scores a significant gain is noted for both sub- 
groups. These mean gains were 25*10 (experimental) and 35.23 
(control). The associated t-ratios were 2.08 and 2.95. 

Table XI shows the facts regarding the theme written at the 
end of the sophomore year. It will be remembered that the theme 
of each student was read by two raters and the student's score 
was the sum of the two independent ratings. The means for the 
experimental subgroup and the control subgroup were strikingly 
similar— 10.23 and 10.19 . One may acquire some notion of the 
distribution of theme ratings within subgroups from the obtained 
sigmas of 2.62 and 2.10. The across-subgroups correlation for 
the 31 matched pairs was .23. 

Tables II through XI have compared the performance of the 
experimental subgroup and the control subgroup on the three 
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TABLE XI 



THE PERFORMANCE OP 31 MATCHED PAIRS OF STUDENTS 
TOTAL OF TWO THEME RATINC5S AT THE END OF THE FOTOTO 
SEMESTER (MAY, 1965)5 DIFFERENCES IN MEANS, AND t-RATIOS 



Subgroup N_ 
Experimental 31 
Control 31 



Theme 
Ratings 
Mean s»T>T 

10.23 2.62 

10.19 2.10 



Difference 
In Means: 
Control 
Minus Exp. 



-.OU 



.23 



t- 

Ratio 

(df=30) 



.08 



criterion measures at three testing junctures. At the end of the 
first semester (January, 1961i), there was evidence for 113 of the 
166 matched pairs of students^ at the end of the fourth semester 
(May, 1965) > there was evidence for 31 of the matched pairs. 

These are the three main subsamples. 

It will be seen from Table XII that on the COOP none of the 
three main subsamples showed a significant mean difference 
between subgroups. That is, when conqparisons were made at each 
of the three testing points using the maximum number of available 
matched pairs, the hypothesis of equality of performance of 
experimental subgroup and control subgroup x-jas supported. 

The evidence was different for GEEB. Here, the end of the 
first semester marked a superiority for the control subgroup over 
the e3q}erimental subgroup. This significant superiority did not 
persist through the second semester or the second academic year. 

Theme scores were such that in each of the three key compar- 
isons there was no significant difference between subgroup means. 

Further inspection of Table XII reveals that for CJOOP one of 
the three secondary comparisons yielded a significant mean dif- 
ference: for 31 matched pairs as of January, 1961*, the control 

subgroup was superior. For GEEB there was also a significant dif- 
ference in favor of the control subgroup in one of the three secon- 
dary comparisons: for 113 matched pairs as of January, I 96 U. 

Theme scores did not produce any significant between-subgroup dif- 
ferences in the three secondary comparisons. 



INTERGORREUTIONS AMONG VARIABLES 
September, 1963— Total Group 

Table XIII contains coefficients of correlation between all 
possible pairs of nine variables for 910 new freshmen at the 
State Gollege of Iowa at the beginning of the fall semester, 

1963 . '\Two of the variables— AGT Gomposite and Percentile Rank in 
High School Graduating Class— customarily serve as indices of 
over-all high school accomplishment and of general potential for 
college study , The other seven variables are test scores in the 
area of language arts. These inter correlation data, together 
with the related means and standard deviations reported in Table 
I (p. 21), provide a description of the I963 SCI freshman class 
pertinent to an investigation of their x^iting performance. 

The highest correlations were between each of the independ- 
ent theme ratings and the sum of these two (Total Theme score); 



TABIE XII 



A SUMMARY COMPARISON OF TEST SCORE MEANS OF EXPERIMENTAL SUBGROUP 
AND CONTROL SUBGROUP AT THREE TESTING POINTS: INDICATION OF 

STATISTICALLY SIGNIFICANT DIFFERENCES FOR SPECIFIED SUBSAMPLES 



Did one of the two subgroups have a significantly 
higher mean than the other? 



Cooperative English 
Tests ; English 
Expression 



January 196ii 

^«ll3 N«3l 



May 1961* May 196 $ 

n«3i 



No No Yes No No 

Control 



No 



College Entrance Exam- Yes Yes No No No No 
ination Board English Control Control 
Composition Test 



No No No No 



No 



Theme 
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Theme rating assigned Reader 2 (September, 1963) 

Sum of theme ratings assigned by Readers 1 and 2 (September, 1963) 
Percentile rank in high school graduating class 



the r's were .85 and .78. It is of special interest that the 
between-readers r was .35. 

Which of the three objective tests correlated highest with 
the Total Theme Rating? Both the ACT English and the CEEB showed 
an r of .1*3> whereas for CCX)P the r was .33. How did the three 
objective tests correlate with one another? The three intercor- 
relations were *59j and .65 (the last, ACT English— CEEB) . 
Since the ACT English score was utilized in selecting the members 
of the experimental pool (basic planning idilch had to be conducted 
in the summer before the students entered college), it is of 
interest to see that ACT correlated reasonably well with the cri- 
terion measures. The correlation is all the more noteworthy when 
one remembers that the ACT battery of tests was in all cases 
taken at least four months prior to the September administration 
of the criterion tests, and may have been taken as many as 10 
months before. 



January, 196U— End of First Semester 

Table XIV is an intercorrelation matrix involving 15 var- 
iables. The 332 students are the combined experimental and con- 
trol subgroups completing the first semester of the I 963-64 
academic year with con^^lete data . These 332 students were a part 
of the 910 for whom beginning- of -the-year intercorrelation data 
for nine variables were presented in Table XIII. 

The six additional variables available in January were the 
January values for CEEB, COOP, Theme Rating one. Theme Rating 
two. Theme Total, and grade-point average for the fall semester. 
Of the September variables, ACT English showed the highest 
relationship with January theme total (r*.W). Another one of 
the September-January comparisons is between theme total on the 
two occasions 5 the obtained r was .I 4 .O. Still another September- 
Januaiy relationship of interest is for the scores on CEEBj for 
this, the r was . 69 . The corresponding figure for COOP was .59* 
The September measure which showed the highest relationship with 
first semester grade point average was ACT Composite 5 the r was 
.61. 



Inter correlation data for the subgroup of I 66 experimental 
students appears in Table XV. For this subgroup, the January 
theme total was best predicted by the September ACT English score 
(r*.55) and CEEB (r«.53). For COOP the corresponding figure was 
.U5. In general, the r’s for the 166 experimental students were 
not markedly different from the r’s for the experimental subgroup 
plus the control subgroup (N*332). 
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At the end of the fall semester, grades in fre^aan composi- 
tion were available for the 166 control students. From the 
evidence in Table XVI, it is possible to rate the 
various instruments in predicting students ' success ^® “““'P® 
sition course. The following listing shows the .i, 

between first semester grade in composition and evidence avail 

able in September on each of several indexes*. 



ACT English 
ACT Composite 
CEEB 
COOP 

SUI Heading 

Percentile Rank in H.S. 
Sept. Theme Total 



.5k 

,h6 

.UU 

.Uo 

.37 

.33 

.33 



Thus both theme performance in January and the by^ 

first semester freshman composition course are best ^ 

tto ACT English score. This finding is 

l"?tfngTX^rc?Vn^U^^^^^^^^ SorThi 

during the students’ senior year in high school. 



May, 196U— »End of First Year 

For the combined subgroups available at **>6 ®nd of the first 
vonr. rtf collece (N»113 113) » the intercorrelation matrix 

LvolvL ai^vfrirtles^ Table *^%ertioJ;“of'’?hf 

for selected pairs of variables. In the first section of ™® 

table, the r's are for total t^eme rating in Ifey and ea^^ 

September scores. In addition to the ® 

bined subgroups, the table also shows the coefficients by sub 

group . 

For samples in which B-113, a 

be as large as .18 to be significant at the .05 level of co^i 
dence . Ten of the twelve correlations between ^ptember 
a^ toy theme total were aignifioant-that is, invested ttat ^ 
the population sa«5>led the correlation 
amount. Among the twelve oong)ari8ona, the r s ranged 

. 29 . 

It is interesting to note the relationship between toe Sep- 
tember scores and the May scores for each of the 
testa which were repeated. For the combined subgroups toe r 
were .67 for COOP, and .61; for CEEB. 
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IKTERCX)RREIATIONS AMONG 16 VARIABLES FOR 166 MTCHED PAIRS 
OF STUDENTS AT THE END OF THE FALL SEMESTER, 1963-61; 




1 







3 

li O 

u I 

o o 

«H O 



CO U\ 
CO CVJ 



o\ 

CVJ 



CVJ 



vO 



s 



C*-\ 

H 



O CO 
H O 



lA 



pH 

CM 



H 

g 

w 

i 

n 




CM 



CM 



I 

i 

0) 

xi 

EH 



•V 

CA 


•> 


•\ 

o 


vO 


vD 


vO 


On 




CJ\ 


H 


H 


H 


•\ 




•* 


1 


1 


U 




fg 


0 


■P 


•p 


+» 


Qt 






<D 


0 


0 


CO 


CO 


CO 



CM 

C- 






vO 



;3 




GO 

cr\ 



o 



I 

0 

Oh 

1 

o 

S 'O 

g (h 
Jc CO 
CO 

0) 0) 

Sh *tJ 
CO 

^ (D to 
to to -H 

2 h 'it 

§i§* 

CO o W 






CM 



o 

o 

0) 

CQ 



to 




.5 



-B 



I 



0) 

•p 



(U 'C 

s g s 




•a 

s 

H 

o 

a 



S« 45 



u 

CO 



w 



51 



*er|c 

Miiliiitilifflfiffiikaii 












“*!**>!* *«M^V 



In May, I 96 I 1 , the 113 control students completed their 
second semester of freshman composition. Their marks in this 
course showed a negligible relationship with September test 
scores. The highest r was .17 for September theme total. 



May, 1965 — End of Second Year 

In Table XVIII the correlation facts are for the 31 matched 
pairs of students for whom full data were available through May, 

1965 (end of sophomore year). It is realized that for such a 
small sample the obtained r*s are relatively unreliable. For an 
N of 31^ an r as large as .36 is required for significance at the 
.05 level (two-tailed interpretation). For an N of 62, the cor- 
responding correlation value is , 2 $. 

One of the kinds of evidence contained in Table XVIII is the 
relationship between variables over a span of two academic years. 

The first section of the table shows correlations between end-of- 
sophomore year theme total rating and beginning-of -freshman-year 
indexes. The largest r was for September, 1963^ ACT English: 
for the combined subgroups (N*»62). 

The September, 1963-May, 1965, correlation for each of the 
two objective tests is presented in the second section. The COOP 
r was for the combined subgroups: CEEB yielded a correspond- 
ing r of . 57 . 

For the control subgroup there is evidence concerning end-of- 
sophoraore-year theme total and of freshman year grade in the 
second semester English composition course. The obtained r of 
-.02 is interesting. In contrast to this low correlation between * 
theme and course grade , the correlation between May, 1965, theme 
total and May, 1961*, theme total was .1*7 (control subgroup). 



RELUBILITY OF CRITERION MEASURES 

Research in English composition hinges on the reliability 
and validity of the measrring instruments employed. This section 
presents certain evidence concerning the reliability of the three 
criterion tests employed in the present investigation. 



Cooperative English Tests: English Expression 

This instrument, published in I960, is composed of two 
parts: “Part I: Effectiveness,” thirty items 5 and “Part II: 
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OORREIATION ^TWEEN SEIECTED E&IRS OF VARIABLES FOR THE 31 MTCHED 
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Mechanics,” sixty items. The time limits are 15 minutes and 25 
minutes respectively. A student’s score is the total number of 
correct responses. This raw score is transformed into a Converted 
Score by means of a table provided by the publishers of the test. 
For Form lA, the possible range in converted scores is from 115 
(raw score of 0) to 191 (raw score of 90). For the two forms of 
the test (1ft., IB) recommended for use with college freshmen and 
sophomores, the investigators were able to find reliability facts 
only for the twelfth grade level. The correlation between paral- 
lel forms was .8b and the standard error of measurement was on 
the order of U.OO converted score units. 



The College Entrance Examination Board English Composition Test 

This is one of the CEEB achievement tests. Evidence about 
the functioning of this instrument seems to be directly concerned 
with validity. This is reflected in one of the earlier reports 
on the instrument, which appeared with the title "Composition 
Test Shows High Validity on Reliable Criterion of Writing Ability” 
(2). The excellent 81|-page report called The Measurement of 
Writing Ability (3) also dealt primarily “tOTK the validity of the 
CojJLege Enirance Examination Board English Composition Test 
(CEEB; . It is realized 'that io achieve validity a test author 
must at the same time achieve reliability. A third source of 
information was The Sixth Mental Measurements Yearbook . Holland 
Roberta, one of the three reviewers of the test, commented on 
reliability: ”For the composition test a Kuder-Richardaon 

formula 20 reliability of .85 and a standard error of measurement 
of 39 is reported, indicating satisfactory discrimination among 
the members of the test group.” (5s 590) 



The Theme 



The theme test consisted of an impromptu paper 300-^00 words 
in length. Students were allowed up to two hours to write the 
paper. A new topic was used at each testing session, but at no 
testing session was more than one topic provided. T^ically, the 
topic consisted of a quotation set in a framework intended to 
link the topic and the student’s experience (See Appendix A). 
Experimental and control students wrote the paper at the same 
time under similar conditions— usually in a period between 3:00 
and 5:00 p.m. 

Each theme was evaluated by two independent readers (see 
discussion p. 18). Each reader assigned each paper a numerical 
value on a nine-point scale. It is thus possible to examine the 



extent of between-reader agreement in assigned ratings. 



The investigators analyzed the scores assigned to 1,070 stu- 
dents; essentially the total incoming freshman class for Septem- 
ber, 1963 , at the State College of Iowa. Table XDCis a frequency 
distribution of the amount of difference between the two ratings 
for each paper. 



It will be noted from Table XIX that 260 of the 1,070 themes 
received the same rating by two readers working independently. 
Another themes were rated within a point of one another. On 
only 23 of the 1,070 themes was there an inter-reader disparity 
of four points or more. As each rater was marking on a 9-point 
scale (9 high, 1 low), there was a potential disparity of 8 
points (9 minus 1) between ratings. This analysis seems to 
suggest that the themes were evaluated with considerable consis- 
tency. In the light of this kind of analysis, the student 
scores — the total of the two independent ratings — appear to be 
sufficiently reliable. 

The above straightforward analysis of the extent of agree- 
ment on independent ratings of the same theme is the most mean- 
ingful basis for thinking about the theme reliability. When one 
moves to the tricky problem of producing a reliability coeffi- 
cient, interpretations are exceedingly coirqplex. One way of 
obtaining a reliability estimate is to conceive of this as a 
single-test-form reliability situation, involving a 9-point test. 
We would actually be studying the consistency of two independent 
ratings of a single test. It is a “short test*' in terms of maxi- 
mum possible score. Table XIV, page ii6, shows that for 910 



TABLE XIX 



FREQUENCY DISTRIBUTION OF THE DIFFERENCE IN TWO 
INDEPENDENT RATINGS ASSIGNED TO EACH OF 1,070 THEMES 



Difference in 



Two Theme Ratings 



N 

m 

U32 

283 

72 

20 

3 



1 



2 

3 

1* 

s 



' ir-f-/’* -w 



it- V’ r-i- 












students the Header 1-Reader 2 r was .35. This is a conservative 
estimate of reader reliability. 

Another way to express the reliability of the theme is to 
regard the sum of the two ratings as a total test score, and each 
rating as the score on a half test.* This would put it in the 
context of an 18-point test. Data on the difference between the 
two independent ratings of each of the themes may be used directly 
in arriving at a reliability estimate, using a procedure outlined by 
Rulon (6:99-103). The standard deviation of the distribution of 
differences in theme ratings was .970. This may be used as the 
estimated standard error of measurement. The standard deviation 
of the distribution of total theme scores for the entire freshman 
class was 2.6)4. The reliability coefficient is then computed 
from r-jn “I - .970^* This produces an r of .87> a spuriously 

2.6u2 

high coefficient of reader reliability. 

Different methods for estimating theme reliability lead to 
such varying coefficients (of reader reliability), and 
evance of each technique to the present situation is so difficult 
to assess, that the most meaningful analysis for present imposes 
seems to the investigators to be that presented in Table juX in 
the form of a distribution of differences in ratings of themes by 
two independent readers. 

It is of interest to note that readers, when giving a paper 
its second reading, showed a slight tendency to be more generous 
in their evaluation than in their first reading. On 283 papers, 
the independent evaluations differed by two points. In 126 
instances the first reading was higher 5 in 157 instances the 
second reading was higher. 

PERR)imNCE BT SEX AND ABILITY LEVEL 
Performance by Sex 

Since a systematic superiority of women on tests of composi- 
tion ability is ordinarily expected, the proportions of iJiaTes and 
females in the sample was determined by the proportions in the 
entire entering freshman class. Furthermore, sex was one of the 
factors used in establishing the matched pairs. To determine 



*For an interesting report of the use of multiple readings of 
themes see Godshalk, Swineford, and Coffman. (3) 
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whether this presumed superiority of women was manifest in the 
present study, the investigators analyzed some of the data by 
sex. Table XX shows the facts for the 113 matched pairs of stu- 
dents for whom full data were available through the second semes- 
ter of college (May, I 96 U). The student performance by sex is 
reported for the three testing points of the freshman year (Sep- 
tember, 19635 January. I 96 I 15 and May, 196ij,) on the three criterion 
measures. 

On COOP, the mean performance of the 83 women in the experi- 
mental subgroup was consistently somewhat higher than that of the 
30 men. This situation also prevailed for the control subgroup. 
The observed superiority seems to have teen slightly greater in 
May (about three points) than in the previous September (about 
two points). 

On CEEB the differences in mean performance by the male and 
female components of the sample are not consistently in the same 
direction. The males tested somewhat higher than the females in 
September, but by Kay the direction of superiority had been 
reversed. This is equivalent to saying that during the freshman 
year the observed mean gain by the female students was greater 
than that by the male students. Following is an analysis of 
these shifts. For the male subsamples (N*30), a t- value of 2.0^ 



E3qperittental 

Difference 
May 196U 

Sept. 1963 CEEB May 196ij CEEB Minus 
N Mean O. ifean S.ti. Sept . 1963 r t 

— 3H r7B 

63 502.98 65.83 5i;0.78 81.1i5 37.80 .70 5.85 



Control 

Difference 
May I 96 I 4 . 



is required for significance at the .05 level, for the female 
subsamples (N*83), a t-value of 1.99. Thus the mean gain of the 
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Sept. 1963 CEEB May 1964 CEEB Minus 
N Mean §.i). kean S.&. Sept. 1963 r t 

^ W7B5TS:^ 3I5rni7 “SOT ' 7T5 

83 U95.U7 69.65 527.67 78.itO 32.20 .60 U.1*0 



Male 

Female 



Male 

Female 



o 
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TABUS XX 



P&RFORMANCE OF 113 MATCHED PAIRS OF STUDENTS, BY 
SEX, ON THREE CRITERION MEASURES AT THE 
BEGINNING, MIDDLE, AND END OP FIRST YEAR OF COLLEGE 



Subgroup 
and Sex 






COOP English 
Test:English 
Expression 
(i960) Con- 
verted Score 


CEEB English 
Composition 
Test 

Standard 

Rating 


Theme : 
Sum of 
Two 

Ratings 


N 


Testing 


Mean S.L. 


Mean S.D. 


Mean S.D. 


Experimental 

Men 

Women 

Total 


30 

83 

113 


September, 

1963 


159.63 8.26 

162.07 7.07 

161.1*8 7.1*7 


505.1*0 71.60 

502.9" 65.83 
503.62 67.1*2 


8.10 2.51 
8.63 1.82 
8.1*9 2.0I* 


Control 

Men 

Women 

Total 


30 

83 

113 


September, 

1963 


160.30 7.36 
162.55 6.51* 
161.96 6.81* 


505.63 75.98 
1*95.1*7 69.65 
U98.17 71.53 


8.07 2.1*1* 
8.61* 1.82 
8.1*9 2.02 


Experimental 

Men 

Women 

Total 


30 

83 

113 


January, 

1961* 


161*. 10 7.57 
165.1*8 6.81* 
165.12 7.06 


1*77.97 73.83 
509.59 80.96 
501.19 80.35 


8.31 2.28(N-29) 
9.58 2.35(N-61) 
9.25 2.1*0(N"110) 


Control 

Men 

Women 

Total 


30 

83 

113 


January, 

196U 


161*. 37 6.20 
166.39 S.o5 
165.85 5.1*5 


519.97 70.11* 

516.98 77.09 
517.77 75.32 


9.1(3 2.1*3 
9.31* 2.26(N“82) 
9.37 2.31(N-112) 


Experimental 

Men 

Women 

Total 


30 

83 

113 


May, 

196k 


163.13 9.86 
167. u* 7.03 
166.08 8.08 


527.13 67.82 
51*0.78 81.1*5 
537.16 78.29 


9.70 2.01* 
9.82 2.01* 
9.79 2.01* 


Control 

Men 

Women 

Total 


30 

83 

113 


a 


163.63 7.97 
166.06 7.35 
165.1*2 7.60 


515.1*7 85.81 

527.67 78.1*0 

52l*.l*6 80.61 


9.10 2.50 
9.72 1.80 
9.56 2.02 
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females in both the experimental subgroup and the control sub- 
group was highly significant. 

For the males in the experimental subgroup the mean gain was 
not quite significant, and in the control subgroup the mean gain 
did not approach significance. Clearly^ women outgained men. Fo 
this subsample, the experimental men outgained control men on 
CEEBu 



A superficial interpretation of this OEEB evidence is that 
Viomen may be neither handicapped nor benefitted by ins^ction in 
freshman composition whereas men may be 
should be an investigation of the unusual 

freshman composition course may have an inhibitory effect on he 

writing improvement of male students. tnales 

suggest that in any composition study in which tlw ratio of males 

to females is not held constant for all treatment pohps, 
••effects" may Improperly be attributed to treatments instead of 
to the male-female imbalance among the groups . 



On the theme ratings, reported in Table XX, there was a 
slight tendency for the females to score higher 
The one exception was in the control subgroup for the January, 
1964, testing. In May, 19^^ within the ey oriental subgroup, 
the mean for males was 9.70, and the mean for females 
Within the control subgroup, the means were 9*10 (mala; ana y»{d 

(female) , 

The evidence summarized in this section indicates that on 
ail three criterion measures the females did indeed tend to per- 
form somewhat better than the males. On the CEEB test the com- 
parisons of gain scores between September and Ifey a^ 
startling superiority for the females, especially in the contro 
group. Had the females dominated one group and t^ males the 
other without the investigators' being aware of it, erroneous 
conclusions could easily have been drawn. The total evidence 
supports the wisdom of maintaining an equal ratio between the 
s^es in experimental and control groups in research concerning 

composition. 



Performance by Ability Level 

Another consideration in methods experiiwirts is the possi- 
bility that the effect may not be the same at all levels of stu- 
dent ability. It is conceivable, for example, that in an inves- 
tigation such as the present one, the ^asion of fon^l coyt’ S^^ 
work in composition might have a negative effect on low-ability 
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students but not on high-ability students. The point involved is 
whether or not what is true over-all is true at specified ability 
levels . 

When the research was planned, it was decided to include 
some analysis of data which would present the evidence for stu- 
dents at four levels of ability. To make this auxiliary analysis, 
the 113 matched pairs of students for whom complete data were 
available through May, 196^ (the freshman year) were used. It 
seemed desirable to establish the four ability levels by some 
measure which was at hand before the beginning of the freshman 
year. The ACT English scores were such measures. Records indi- 
cated that one could divide the total incoming fresl^n class in 
September, 1963> into four ability levels of essentially equal 
size by using the ACT standard score intervals of 2 $ and above, 
23-2U, 21-22, and 20 and below. To explore performance at dif- 
ferent ability levels CEEB was used. 

Table XXI presents the CEEB data for the 113 matched pairs 
grouped by the ACT standard score intervals . It may be noted, 
first of all, that the N*s are not uniform. This variation in 
N’s exists from level to level and, to a lesser extent, between 
e 3 q)erimentals and controls at each level. It would be antici- 
pated that attrition would be more noticeable among the students 
in the lower-ability groups. This was true in the present situa- 
tion except that the smallest N*s were at the third rather than 
the fourth ability level. This pattern of frequencies was 
already present in the sample of 166 matched pairs which completed 
the first semester. In January, 26 of the 166 individuals (16‘g) 
in each of the two subgroups were located at the third ability 
level. Of the 113 matched pairs in May, lit of the experimentals 
(12^) and 20 of the controls (18JS) were at the third ability 
level . 



Of priaicipal interest in Table XXI is the column headed 
"CEEB May Minus Sept,," which contains information concerning the 
ejqperimental- control relationship at each of four ability levels 
in addition to the facts concerning main effect. At how many, if 
any, of the four ability levels were the findings substantially 
different from the over-all findings? Only at the third level, 
the level with the smallest N*s, is the evidence different from 
the over-all picture . For the II 4 . experimental students at the 
third ability level, the mean gain was U9.22, and for the 20 con- 
trol students, -8.20. This evidence may or may not be suggestive 
of actual interaction. At all four levels, and over-all, the 
experimentals are somewhat higher than the controls on September— 
May gain on the CEEB. The provocative fact is that at the third 
level this advantage is conspicuously greater than at the other 
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three levels. (See Table VII for overall CEEB data for 113 
matched pairs on September^ January^ and Nay testing.) 

If one looks at the evidence in Table XXI from the stand- 
point of student ability level, disregarding the treatment fac- 
tor, it is evident that the amount of gain on CEEB, September to 
May, was at least as great in the upper one-half of the ability 
breakdown as it was in the lower one-half. This is in contrast 
to the view frequently held that the greater gains will be at the 
lower levels, the lower gains at the higher levels. 



CONCLUSIONS AND OBSERVATIONS 



The ob;jective of the total research project las to test two 
hypotheses. The first of these v^as that the writing performance 
of students enrolled in a college freshman composition sequence 
is not significantly different from the writing performance of 
comparable students not enrolled in a college freshman composi- 
tion sequence when the two subgroups have attended college for an 
equal length of time. The second hypothesis was that evidence 
from a single sample at one institution would be confirmed by 
evidence from a second sample at the given college and by evidence 
from other colleges and/or universities. The present discussion 
relates to the test of the first hypothesis. The test of the 
second hypothesis must await the analysis of evidence collected 
from the five institutions which replicated the study described 
in this report. 

It should be emphasized that the investigators are present- 
ing these conclusions and observations on the basis of the pilot 
phase of the study alone^ without any utilization of evidence 
from the major phase— the phase involving replication by five 
institutions. This is the fair thing to do, and it is also the 
best way for the investigators to make maximum use of the pilot 
phase, one of the purposes of which was to provide experience 
useful in the major phase. It is clearly understood that the 
evidence for the major phase, to be reported in 1967, will con- 
stitute a far more dependable basis for conclusions and observa- 
tions than does the evidence in the present document. 



The Basic Findings 



In the pilot phase, statistics were obtained for three eval- 
uative measures applied at the beginning of the experiment and at 
three subsequent times. The hypothesis was thus subjected to 
review with each of three testing instruments on each of three 
testing occasions. The main subsamples (subgroups) were students, 
from an original 210 matched pairs, who remained in the experiment 
at least one semester; some of the students persisted through all 
four semesters involved in the pilot study. There were nine main 
••tests" of the null hypothesis. Only one of them led to a rejec- 
tion of the hypothesis. The following tabular presentation shows 
these facts. Examination of the statistical portion of the present 
report and of this summary table forces the generalization that the 
first hypothesis has been sustained. The only point at which sig- 
nificant difference was found between major subgroups at a partic- 
ular testing period was at the end of the first semester of 
instruction. The difference, in favor of the control students 



(those receiving instruction), is on CEEB, an objective test. 



NINE MAIN COMPARISONS OF EXPERIMENTAL 
SUBGROUP AND CONTROL SUBGROUP: 

Was the null hypothesis rejected? 





January, 196 I 4 
(N«166 
Matched 
Pairs) 


May, I 96 I 4 

(N -113 

Matched 

Pairs) 


May, 1965 
(N«31 
Matched 
Pairs) 


COOP ENG 


No 


RoT"^ 


No 


CEEB ENG 


Yes, Controls 
Excelled 


No 


No 


THEME TOTAL 


No 


No 


No 



Among the 166 matched pairs who were tested in January, 
I 96 U, were the 113 matched pairs whose performance is reported 
for May, 19^, and among the 113 matched pairs were the 31 
matched pairs whose performance is reported for May, 1965* The 
investigators report in the following table the January, 196U, 
performance of the subsamples of 113 and 31 matched pairs, and 
the May, I 96 U, performance of the subsample of 31 matched pairs. 



NINE SECONDARY COMPARISONS OF EXPERIMENTAL 
SUBSAMPLE AND CONTROL SUBSAMPIE: 



Was the null hypothesis rejected? 
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This table reveals that in the nine secondary comparisons 
two instances of significant difference appeared. The subsample 
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of 113 matched pairs which persisted through May, I96I1, showed a 
statistically significant difference in favor of the control stU' 
dents on their January, I96U, performance on CEEB, the 
objective test on which the I66 control students presented its 
sole instance of a statistically significant difference. The 
subsample of 31 matched pairs who persisted to the end of the 
second year of the investigation achieved a significant differ- 
ence on the January, 1961*, performance on CX50P. One may summar- 
ize by saying that in eighteen comparisons of performance--nine 
for the main subgroups and nine for the subsamples— the null 
hypothesis was rejected three times. 



Performance by Sex and by Ability Level 

The investigators explored selected factors in addition to 
those involved in the basic coirp^risons . One of these was the 
variation of performance by sex. Females as groups consistently 
performed somewhat better than males on the criterion measures. 

As sex was used as one of the matching criteria in the present 
investigation, the ratio of males to females was the same in the 
experimental and the control subgroups . If these ratios had not 
been kept constant, an observed superiority for a subgroup could 
have been improperly attributed to the treatment rather than to 
the ratio between sexes. An important conclusion is, * +4 

in investigations concerning competence in composition, the ratio 
between sexes must be taken into account in the groups whose per- 
formance is being studied. 

The second factor concerning which the investigators made a 
special analysis was that of gains by ability levels. It is 
often assumed that the greatest improvement will be shown by the 
lower-average group, as they presumably have not only the capa- 
city*' to improve but "room" on the evaluative instruments in 
which to show their progress. Conversely, the students at the 
upper levels cannot go much higher, and may even decline— at 
least as a result of the phenomenon of regression. This general 
assumption did not hold in the present study. Rather, the gains 
at the upper ability levels were greater than at the lower 
ability levels when the sample was segmented on the basis of ACT 
English, a test students took before they entered college. It 
would appear that the disparity in performance between the 
"better" and the "worse" students in con5)osition tends to widen 
somewhat during the first year of college. 



Observations 



The first observation is that the only testing occasion on 



which significant differences between subgroups were found was in 
January, 196k, at the end of the participants’ first semester of 
college. In each instance, the difference was in favor of the 
control subgroup on an objective test. The subgroups of 166 
matched pairs and 113 matched pairs showed this difference on 
CEEB, and the subsample of 31 on COOP. Without now speculating 
about the difference between these two instruments, one may 
observe that this is the only testing occasion at which the stu- 
dents receiving instruction in freshman composition showed a sig- 
nificant superiority over the students not receiving such instnic- 
tion. By the end of the second semester, the experimental stu- 
dents were not statistically different from the control students . 
Such performance suggests that the advantage of one semester of 
instruction in composition apparently disappears by the end of 
the first year even when instruction continues during the second 
semester. Put another way, it suggests that instruction hastens 
a development which will occur eventually through maturation or 
some other influence. 

Related to the above is an observation concerning the per- 
formance in May, I 96 U, and in May, 1965, of the 31 matched pairs 
who persisted. Both the experimental subgroup and the control 
subgroup showed a decline in performance during the second year 
of college on CEEB. For the experimental subgroup, the decline 
was statistically significant. If, during the sophomore year, 
there is a decline in writing ability as measured by CEEB, it 
appears more likely to occur among those who have not had instruc- 
tion in college freshman composition than among those who have 
had such instruction. This second-year evidence suggests that 
perhaps students reach a plateau of performance at the end of the 
freshman year. These are straws in the wind, observations made 
on the basis of one objective test— CEEB. Nevertheless, they are 
interesting hints, if nothing more. 



SUMMARY 



This is an interim report of Research Project 2188 as 
amended, which investigates the effectiveness of collep freshman 
con^osition, TThe objective of the project is to test 
eses: (1) thk the writing performance of students completing 

freshman composition does not differ significantly from the 
writing of students not taking freshman composition when both 
have been in college the same length of time, and (2) that repli- 
cation of the experiment at several institutions will support the 
first hypothesis • This interim report covers the ^ i 

the investigation, developed at the State College of Iowa 1963-65, 
and relates to the first hypothesis only. Evidence of the test 
of the second hypothesis will be presented in the final repor • 

For the investigation, some students taking composition wore 
matched with students not taking composition on the basis o^ge, 
sex, theme score, CEEB and COOP. The two objective (COOP 

and CEEB) and the theme were the criterion measures . Students 
were tested at the start (210 pairs), at the end of the first 
semester (166 pairs), at the end of the 

pairs), and at the end of the fourth semester (31 ^® 

themes were evaluated by teams selected by Fred Qodshalk, Chair- 
man of Test Development in Humanities at the Educational Testing 
Service, from the pool of theme readers used by ETS for college 
entrance and advanced placement . 

Results sustained the first hypothesis, that the 
writing performance of a group of atuderts who 
composition (control) does not differ significantly frm ttat of 
students who have had no compositi on ( ^ r imental ) . Of 
main comparisons-COOP, CEEB, and THEME on each of tteee occasions 
(end of first semester, end of second semester, ®nd of fourth 
semester )-the null hypothesis was supported on ei^t, ^ om 
exception being that the control group excelled on the CEEB at 

the end of first semesters at the end of the ®®®®?f ®?“®®*®*'.?2^ 
at the end of the fourth semester, there was no significant dif 

f erence . 

Two other factors were examined; performance by sex and by 
ability level. Females performed consistently ^i®» 

on all criteria. Prom this study, it is inferred that *h« 
between the suces must be taken 
oon«)etence is a factor in a reasearch 

segments were determined by scores on ACT English. On this ®J®l®i 
obtained gains at upper-ability levels were soroe^t greater than 
those at lower-ability levels. In this study, it ®P^f® J^at 
disparity in performance between upper and lower ability students 
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tends to increase during the first year of college. 

The only testing occasion when there was a significant dif- 
ference between control and e3qperimental subgroups was at the end 
of the first semester on an objective test. This difference 
favored the control subgroup. This advantage disappeared by the 
end of the second semester. Instruction seems to hasten a devel- 
opment in writing achievement which will occur anyway as the 
result of instruction in a college environment or some other fac- 
tor. Based again on an objective test only (CBEB), there is a 
decline in performance during the second year, a decline which 
seems greater for those who have had no writing instruction than 
for those who have had such instruction. 
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APPENDIX A 



Theme Topics and Instructions 

The principles followed in selecting topics^ the use of a 
single topic on each test administration, and the equivalence of 
topics actually eiriployed need to be discussed briefly. 

Three criteria were established in selecting topics for the 
theme tests j the topic must be of a middle level of abstraction, 
it must be related to the students' experience, and it must call 
for an individual rather than a stock response. A middle level 
of abstraction avoided favoring either the students who were 
skillful in exploring general principles or the students who hap- 
pened to have special knowledge related to a specific topic. A 
topic related to the students' experience and knowledge allowed 
them to support and illustrate their general statements with 
particulars readily available to them. A topic calling for an 
individual rather than a stock response provided a test of the 
students* ability to establish and support an original thesis. 

The use of a single topic rather than a choice among several 
topics on each testing occasion avoided the introduction of an 
additional variable whose influence would be difficult to esti- 
mate. Such a restriction seemed justified by the fact that the 
students' performance as individuals was not under investigation. 
There is no reason to believe that if the students had had a 
choice of topics, comparison of their group performance would 
have been different from that resulting from a single topic. 

Equivalence of topics across testing occasions was not vital, 
as students' change scores on theme performance were not consid- 
ered in the conclusions in this study. Though it was hoped that 
the topics used would be comparable to one another, at^ lack of 
similarity which may be present cannot be used meaningfully in 
speculation about the results achieved. The subgroups were com- 
pared with one another on their performance at each testing 
occasion. Changes from occasion to occasion within subgroups 
were not investigated. 

On the following pages are the instructions and theme topics 
for the various testing sessions. The complete instruction 
sheets, with places for the readers' ratings, the name and number 
of the student, and the like have not been reproduced as these 
details are irrelevant and their reproduction difficult. It 
should be noted, however, that the original instruction sheets 
were so arranged that the graders could learn neither the student's 
name nor the date on which the paper was written, and the second 
reader could not see the rating given the paper by the first 
reader . 
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(Theme Instructions for September 1963) 



THEME INSTRUCTIONS 

1. The paper which you are about to write will be judged on your 
success in presenting your thoughts in a clear, unified, well- 
organized manner, observing the conventions of standard 
written English. You should thixik about the topic \mtil you 
have determined the idea you want to convey to the reader and 
the general procedure you will follow in doing so. Then you 
may write your paper. Do not hesitate to make a brief outline 
if you desire to do so (use the back of this sheet). An out- 
Tlne Is not required. 

2. You should be as neat as you can, but you should not hesitate 
to make changes if you believe them to be necessary. You do 
not have to make a fair copy. 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

You may write in pen or in pencil, but pen is preferred. 

5. Be certain that your NAME IS ON EVERY SHEET, in the UPPER 
RIGHT-HAND CORNER, and that it, as well as the rest of your 
writing, is as legible as you can make it. 

6. Turn in all of the paper given to you. 

7. IfiNGTH: 300-500 words. 



TOPIC 

Few question the idea that '’oyalty is a virtue. However, 
there are occasions when loyalties conflict. For example, 
loyalty to one’s family or school may conflict with loyalty to 
one’s friends; loyalty to an ideal may conflict with loyalty to 
the group. Thus one must sometimes choose to be disloyal to one 
thing in order to be loyal to another. 

Atteir?)t to deteiTnine a principle which you feel would be 
useful in making such a choice, using exanqples from your own 
experience and observation to indicate how you arrived at the 
principle you recommend. 
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(Theme Instructions for January 19614) 



THEME INSTRUCTIONS 

1. The paper "which you are about to write will be judged on your 
success in presenting your thoughts in a clear, unified, well- 
organized manner, obser'ving the con"ventions of standard 
written English* You should think about the topic until you 
have determined what idea you want to convey to the reader and 
the general procedure you will follow in doing so. Then you 
may write yo"ur paper. Do not hesitate to make a brief outline 
if you desire to do so (use the back of "this sheet). An out— 
line Is "no"i re'^iredT" 

2. You should write as neatly and legibly as you can, but you 
should not hesitate to make changes between the lines if you 
believe them to be necessary. You do not have to copy the 
paper over. 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

k. You must write vdth INK OR EALL-POINT PEN. 

9. Be certain to write your STUDENT NUMBER in each of the blanks 
(two at the top, one at the bottom) provided for it on this 
sheet, and in the upper right-hand corner of each page of your 
theme . 

6. Turn in all of the paper given to you. 

7. LENGTH; 300-500 words. 



TOPIC 



Conventional is a word frequently used to refer to customary 
attitudes , beliefs or actions • In the United States it is a con- 
vention for men to be clean-shaven, women to wear a certain 
amount of make-up, boys to be interested in sports and girls to 
be interested in becoming wives and mothers . A person who is 
unconventional in some way departs from the conventions of action 
or belief of the society of which he is a part. 

With this explanation in mind, discuss the following state- 
ment; "Convention is society »s safeguard, but also its 
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potcnt^ial ©xecut/ioncr • •' To what extent and in what ways do you 
agree with this statement? Use examples and details ffom your 
knowledge and experience to support your conclusion. 



(Theme Instructions for May I 96 U) 



THEME INSTRUCTIONS 

1. The paper ‘which you are about to 'write will be judged on your 
success in presenting your thoughts in a clear, unified, well- 
organized manner, observing the conventions of standard 
written English. You should think about the topic until you 
have determined what idea you want to convey to the reader and 
the general procedure you will follow in doing so. Then you 
may write your paper. Do not hesitate to make a brief outline 
if you desire to do so (use the back of this sheet). An out- 
line IS not required. 

2. You should “write as neatly and legibly as you can, but you 
should not hesitate to make changes between the lines if you 
believe them to be necessary. You do not have to copy the 
paper over. 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

k. You must write with INK OR BALL-POINT PEN. 

5 . Be certain to write your STUDENT NUMBER in each of the blanks 
(two at the top, one at the bottom) provided for it on this 
sheet, and in the upper right-hand corner of each page of your 
theme . 

6. Turn in all of the paper given to you. 

7. LENGTH; 300-500 words. 



TOPIC 



I returned and saw under the sun, that the race is not to the 
swift, nor the battle to the strong, neither yet bread to the 
wise, nor yet riches to men of understanding, nor yet favor to 
men of skillj but time and chance happeneth to them all. 

— Ecclesiasties 9:11» 

Using your experience and observation, indicate why you agree 
or disagree with the statement made in this quotation. 



(Theme Instructions for May 1965) 



THEME INSTRUCTIONS 

* 1. The paper which you are about to write will be judged on your 

‘ success in presenting your thoughts in a clear^ unified^ well- 
organized manner^ observing the conventions of standard 
written English. You should think about the topic until you 
have determined what idea you want to convey to the reader and 
the general procedure you will follow in doing so. Then you 
may write your paper. Do not hesitate to make a brief outline 
if you desire to do so (use the back of this sheet). An out- 
Une Is ^not required . 

2. You should write as neatly and legibly as you can, but you 
should not hesitate to make changes between the lines if you 
believe them to be necessary. You do not have to copy the 
paper over. 

3. WRITE ON ONE SIDE OF THE PAPER ONLY. If you need more paper, 
ask for it. 

U. Begin on the third line of the first sheet, and WRITE ON EVERY 
^ LINE THEREAFTER. 

5. You must write with INK or BALL-POINT PEN. 

6. Be certain to write your STUDENT NUMBER in the blank provided 

at the top of this instruction sheet in the upper left-hand 
corner under the Total Score box. It should also be written 
on each page of your theme. Do IWT write your name s or the 
name of your schools in any place other than the blarUT* pr o- 
HBedTa tTKe Bo^ oT“tEIs sEeeB. 



7. Turn in all of the paper given to you. 

8. You must stay at least one hour and fifteen minutes. 

9. LENGTH: 300-500 words. 



TOPIC 

As society becomes increasingly complex, the number of people 
upon whom we are dependent Increases. Daniel Boone killed a bear 
and ate it. When we buy steak, we purchase the services of the 
person who produced the animal, the person who fattened it, the 
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person who took it to market, the packing company which bought it, 
slaughtered it, and dressed it, the trucker who transported it to 
the store from which we bought it, and, of course, the grocer 
himself. Each person must do his part if we are to have the 
steak. Even this picture is greatly over-simplified. There are, 
for example, the gasoline which fueled the truck and the truck 
itself. Considering the interdependence illustrated by the story 
of the steak, how ffee are we to guide our own lives? Are we 
liberated from stalking, killing, skinning, and cleaning our 
dinner, or are we robbed of our independence? Can we say, as 
Henley did, *'I am the master of my fate; /I am the captain of my 
soul”? Does modern technology liberate us or dominate us . 

Present yow opinion, based upon your knowledge, observation, and 

experience . 



APPENDIX B 



Choice of Experimental Design 

In planning research^ the most complex questions are those 
concerned mth the choice of experimental design. The questions 
are both theoretical and functional. These two kinds of consid- 
eration come together when one finally must decide the best way, 
under the circumstances in which a given study will be made, to 
collect and analyze data for meaningful samples of students. In 
the present study, three circumstances dictated the choice of a 
matched pairs design. 

The first circumstance was the college administration's 
stipulation that students who were to receive the experimental 
treatment be informed of the fact prior to their registration. 

It seemed essential that such students, their parents, and the 
faculty advisors be given advance Information about the purpose 
of the research and its impact on them. These experimental stu- 
dents would not receive instruction in freshman composition — a 
major departure ftom normal college experience. Given the faith 
of students in the importance of composition ( 1 ^: 46 ), to have 
denied them enrollment on registration day without prior warning 
could have induced anxiety and resentment, possibly producing a 
kind of "reverse" Hawthorne effect. Added to this would have 
been confusion in registration. Irritation among advisors, and 
concern among parents. 

Thus the Investigators were compelled to select, in advance 
of September registration, the students who would receive the 
experimental treatment. As described on page Vf, this procedure 
involved selecting a pool of students from those who, by July 1 , 
1963, had met admission requirements and expressed their inten- 
tion to enroll in the State College of Iowa. There was, of 
course, no assurance that all of the selected pool would actually 
enroll. This pool, which was a random sample from the July list, 
would not be a random sample of the September freshman class. 

That is, some entering freshman students had no opportunity to be 
Included, and some who were included in the July group did not 
enroll . 

A second circumstance was the duration of the investigation. 
The experimental design called for the students to be tested 
through the end of their sophomore year. That relatively heavy 
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attrition would occur was certain^* that it would have an equal 
effect on both treatment groups seemed unlikely. Among other 
considerations^ the control students would be enrolled in a course 
which firequently causes students trouble, while the experimental 
students would not. In ary event, the possibility that attrition 
would occur in such a way that the two treatment groups would 
become progressively dissimilar could not be ignored. 

Related to the attrition problem was the importance of main- 
taining the same ratio of males to females in both of the sub- 
groups. The investigators believed, and oheir belief is supported 
by data subsequently examined (see page 56), that females would 
perform somewhat better than males on measures of composition 
ability. Should the ratio between sexes in one group become sub- 
stantially different from the ratio in the other group, the like- 
lihood of distorted results would be present. 

A third circumstance was the audience which would read the 
research. As the investigation concerns the effectiveness of a 
course usually taught in departments of English, members of 
English departments would be the group for whom the report was 
primarily intended. It seems fair to say that such an audience 
would have considerable difficulty in following the intricacies 
of analysis of covariance. Though this consideration may at 
first seem somewhat frivolous, its pertinence to the potential 
impact of the project is nonetheless real. 

In the light of these circ\imstances, the investigators 
became convinced that the matched-pairs design should be employed. 
Matching after September registration Insured a list of students 
who were actually enrolled. Use of the matched pairs design with 
sex as one criterion made certain that the ratio between males 
and females would be the same for both subgroups not only at the 
beginning, but at any subsequent point in the investigation. Use 
of matched pairs minimized the possibility that in the attrition 
which would occur over the life of the experiment some factor 
would operate unequally to reduce the similarity of the subgroups. 
Finally, use of matched pairs enabled the investigators to present 
results in a manner which would make them readily available to 
members of English departments and directors of freshman composi- 
tion. 



*The Registrar of the State College of Iowa estimates that the 
attrition for the freshman class between September, 1963, and 
May, 196^, was on the order of 19 per cent, and the attrition 
between September, 1963, and May, 1965, was approximately liO per 
cent. 



The investigators could^ of course^ have set up the subgroups 
from among the students whose data were available in July, taking 
first a random sample of the total group, pairing them, and then 
for each matched pair of students randomly assigning one member 
of the pair to the experimental treatment and the other member of 
the pair to the control treatment. However, in July the only 
pertinent test data available for the students was their perfor- 
mance on ACT English. As the Investigators wished to match as 
closely as possible, they decided to wait until more tests could 
be administered during the fall semester orientation period. 

Doing so permitted matching as reported on page 18, by age, sex, 
theme performance, and a score derived from performance on the 
CEEB and COOP. This precision in matching provided increased 
confidence in the similarity between the two treatment groups. 
Closeness in matching was also facilitated the fact that the 
supply of subjects was greater in September than it was in July. 

Three additional points. Since there were only two treat- 
ment groups, the matched pairs approach was more feasible than if 
there had been several treatment groups. Secondly, the investi- 
gators did not have to use, indeed did not wish to use, intact 
classroom groups for the control treatment. Finally, in methods 
experiments generally, random sawpples of a real population are 
not attainable. Near-randomness is achieved only in the beginning 
stages, and not in the groups which actually complete the exper- 
imental period. 
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Procedure for Evaluating Themes 

I^lor to each reading session^ Mr. Jewell would send to Hr. 
Godshalk about forty themes^ selected at random. From this sam- 
ple Mr. Qodshalk would determine the general nature of the total 
set of themes. He would choose a number of themes that in his 
judgment were typical 'of range and treatment, and Mr. Jewell 
would have these duplicated. These became the sample themes used 
during the reading as practice themes. 

Mr. Godshalk 's main responsibility when the raters (the 
smallest number was nine) had assembled was to communicate to 
them the criteria for evaluating the papers. First, he would 
have Mr. Cowley and Hr. Jewell describe the purpose of the inves- 
tigation, the circumstances under which the papers had been 
written, and the students who had written them. He would then 
explain the rating scale. When all questions concerning its 
application had been answered, he would distribute several sample 
themes to be rated. After he had made a tally of the various 
values assigned to these papers, he would allow Individuals to 
explain their ratings or to question his rating. If a rater 
seemed to be over-reacting to something in the papers, something 
which Mr . Godshalk believed from examination of the sample papers 
was typical, he would so Inform the readers and caution them 
against misinterpreting particular aspects of the papers. For 
example, on the loyalty topic, he felt it useful to point out 
that Iowa has many strongly religious communities, so that the 
use of religious principles in response to the theme topic was 
the reflection of sincere belief and not pious platitude to 
inpress the reader. 

Before setting the readers to work in earnest, he would 
remind them that since they were experienced readers their first 
judgment of a theme as a whole was probably as valid as any sub- 
sequent judgment they might make of the same paper. Therefore, 
they were not to pause and consider but were to read and respond. 
As the rating session progressed, Mr. Godshalk would note whether 
any particular rater seemed to judge consistently In a way dif- 
ferent from the other raters. At relatively frequent intervals, 
he would interrupt the reading to allow the readers to relax and 
would read aloud papers which had been passed on to him by indi- 
vidual readers. Frequently, these papers posed special problems 
which Mr. Godshalk would have the group discuss, always making 
clear his own judgment. The goal of the Initial orientation and 
of the subsequent breaks in the reading was for Mr. Godshalk to 
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convey to the readers his criteria and to get them to standardize 
their scoring so that they would agree in their ratings. The 
reading would be most **perfect” when 3ll of the readers rated 3 
the papers in the same way that Mr. Godshalk would rate them. In 
practice his standards would be slightly altered if a consensus 
indicated they should be. Thus, the validity of 
could be no greater than the validity of Mr . Godshalk s criteria 
as modified on occasion by discussion with the readers . 



